Core

Core objects and functions for fastrl and reinforcement learning in general

Primitives

StepTypes are generated by environments and used by RL models for training / execution.

add_namedtuple_doc

 add_namedtuple_doc (t:<class'NamedTuple'>, doc:str, **fields_docs:dict)

Add docs to t from doc along with individual doc fields fields_docs

	Type	Details
t	NamedTuple	Primary tuple to get docs from
doc	str	Primary doc for the overall tuple, where the docs for individual fields will be concated.
fields_docs	dict

SimpleStep

 SimpleStep (state:torch.FloatTensor=tensor([0.]),
             action:torch.FloatTensor=tensor([0.]),
             next_state:torch.FloatTensor=tensor([0.]),
             terminated:torch.BoolTensor=tensor([True]),
             truncated:torch.BoolTensor=tensor([True]),
             reward:torch.FloatTensor=tensor([0]),
             total_reward:torch.FloatTensor=tensor([0.]),
             env_id:torch.LongTensor=tensor([0]),
             proc_id:torch.LongTensor=tensor([0]),
             step_n:torch.LongTensor=tensor([0]),
             episode_n:torch.LongTensor=tensor([0]),
             image:torch.FloatTensor=tensor([0.]))

Represents a single step in an environment.

Parameters: - state:<class 'torch.FloatTensor'> = tensor([0.])Both the initial state of the environment and the previous state. - action:<class 'torch.FloatTensor'> = tensor([0.])The action that was taken to transition from state to next_state - next_state:<class 'torch.FloatTensor'> = tensor([0.])Both the next state, and the last state in the environment - terminated:<class 'torch.BoolTensor'> = tensor([True])Represents an ending condition for an environment such as reaching a goal or ‘living long enough’ as described by the MDP. Good reference is: https://github.com/openai/gym/blob/39b8661cb09f19cb8c8d2f59b57417517de89cb0/gym/core.py#L151-L155 - truncated:<class 'torch.BoolTensor'> = tensor([True])Represents an ending condition for an environment that can be seen as an out of bounds condition either literally going out of bounds, breaking rules, or exceeding the timelimit allowed by the MDP. Good reference is: https://github.com/openai/gym/blob/39b8661cb09f19cb8c8d2f59b57417517de89cb0/gym/core.py#L151-L155’ - reward:<class 'torch.FloatTensor'> = tensor([0])The single reward for this step. - total_reward:<class 'torch.FloatTensor'> = tensor([0.])The total accumulated reward for this episode up to this step. - env_id:<class 'torch.LongTensor'> = tensor([0])The environment this step came from (useful for debugging) - proc_id:<class 'torch.LongTensor'> = tensor([0])The process this step came from (useful for debugging) - step_n:<class 'torch.LongTensor'> = tensor([0])The step number in a given episode. - episode_n:<class 'torch.LongTensor'> = tensor([0])The episode this environment is currently running through. - image:<class 'torch.FloatTensor'> = tensor([0.])Intended for display and logging only. If the intention is to use images for training an agent, then use a env wrapper instead.

Now we can generate a couple to send their a pytorch data loader.

torch.manual_seed(0)
SimpleStep.random(state=torch.FloatTensor(2).fill_(0))

SimpleStep(state=tensor([0., 0.]), action=tensor([39.]), next_state=tensor([33.]), terminated=tensor([False]), truncated=tensor([True]), reward=tensor([79.]), total_reward=tensor([27.]), env_id=tensor([3]), proc_id=tensor([97]), step_n=tensor([83]), episode_n=tensor([1]), image=tensor([66.]))

SimpleStep.random(state=torch.FloatTensor(2).fill_(0)).clone()

SimpleStep(state=tensor([0., 0.]), action=tensor([99.]), next_state=tensor([78.]), terminated=tensor([False]), truncated=tensor([False]), reward=tensor([68.]), total_reward=tensor([94.]), env_id=tensor([33]), proc_id=tensor([26]), step_n=tensor([19]), episode_n=tensor([91]), image=tensor([54.]))

from torchdata.dataloader2.dataloader2 import DataLoader2
from torchdata.dataloader2.reading_service import MultiProcessingReadingService
from torchdata.dataloader2.graph import traverse
import torchdata.datapipes as dp

def seed_worker(worker_id): torch.manual_seed(0)
def random_step_generator(): 
    while True: yield SimpleStep.random()
    

pipe = dp.iter.IterableWrapper(random_step_generator(),deepcopy=False)
pipe = pipe.batch(batch_size=3)

g = torch.Generator()
g.manual_seed(0)
dl = DataLoader2(
    pipe,
    reading_service=MultiProcessingReadingService(num_workers=2,worker_init_fn=seed_worker)
)

for o in dl:
    print(o)
    break

/opt/conda/lib/python3.7/site-packages/torch/utils/data/datapipes/iter/utils.py:44: UserWarning: The input iterable can not be deepcopied, please be aware of in-place modification would affect source data.
  "The input iterable can not be deepcopied, "

[SimpleStep(state=tensor([44.]), action=tensor([39.]), next_state=tensor([33.]), terminated=tensor([False]), truncated=tensor([True]), reward=tensor([79.]), total_reward=tensor([27.]), env_id=tensor([3]), proc_id=tensor([97]), step_n=tensor([83]), episode_n=tensor([1]), image=tensor([66.])), SimpleStep(state=tensor([56.]), action=tensor([99.]), next_state=tensor([78.]), terminated=tensor([False]), truncated=tensor([False]), reward=tensor([68.]), total_reward=tensor([94.]), env_id=tensor([33]), proc_id=tensor([26]), step_n=tensor([19]), episode_n=tensor([91]), image=tensor([54.])), SimpleStep(state=tensor([24.]), action=tensor([41.]), next_state=tensor([69.]), terminated=tensor([True]), truncated=tensor([True]), reward=tensor([80.]), total_reward=tensor([81.]), env_id=tensor([12]), proc_id=tensor([63]), step_n=tensor([60]), episode_n=tensor([95]), image=tensor([85.]))]

Record

 Record (name:str, value:Any)

Testing

Additional utilities for testing anything

test_in

 test_in (a, b)

test that a in b

test_in('o','hello')
test_in(3,[1,2,3,4])

test_len

 test_len (a, b, meta_info='')

test that len(a) == int(b) or len(a) == len(b)

test_len([1,2,3],3)
test_len([1,2,3],[1,2,3])
test_len([1,2,3],'123')
test_fail(lambda:test_len([1,2,3],'1234'))

test_lt

 test_lt (a, b)

test that a < b

test_lt(4,5)
test_fail(lambda:test_lt(5,4))