import pandas as pd
from fastrl.envs.gym import GymTypeTransform,GymStepper
from fastrl.pipes.map.transforms import TypeTransformer
NSkip
DataPipe for skipping env steps env-wise.
NSkipper
NSkipper (*args, **kwds)
Accepts a source_datapipe
or iterable whose next()
produces a StepType
that skips N steps for individual environments while always producing 1st steps and terminated steps.
Below we skip every other step given 3 envs while always keeping the 1st and terminated steps.
def n_skip_test(envs,total_steps,n=1,seed=0):
= dp.map.Mapper(envs)
pipe = TypeTransformer(pipe,[GymTypeTransform])
pipe = dp.iter.MapToIterConverter(pipe)
pipe = dp.iter.InMemoryCacheHolder(pipe)
pipe = pipe.cycle()
pipe = GymStepper(pipe,seed=seed)
pipe = NSkipper(pipe,n=n)
pipe
= [step for step,_ in zip(*(pipe,range(total_steps)))]
steps return steps
= n_skip_test(['CartPole-v1']*3,200,2,0)
steps 'state','next_state','env_id','terminated']][:10] pd.DataFrame(steps)[[
state | next_state | env_id | terminated | |
---|---|---|---|---|
0 | [tensor(0.0137), tensor(-0.0230), tensor(-0.0459), tensor(-0.0483)] | [tensor(0.0132), tensor(0.1727), tensor(-0.0469), tensor(-0.3552)] | tensor(140527143417168) | tensor(False) |
1 | [tensor(0.0137), tensor(-0.0230), tensor(-0.0459), tensor(-0.0483)] | [tensor(0.0132), tensor(0.1727), tensor(-0.0469), tensor(-0.3552)] | tensor(140527143418320) | tensor(False) |
2 | [tensor(0.0137), tensor(-0.0230), tensor(-0.0459), tensor(-0.0483)] | [tensor(0.0132), tensor(0.1727), tensor(-0.0469), tensor(-0.3552)] | tensor(140527143418960) | tensor(False) |
3 | [tensor(0.0132), tensor(0.1727), tensor(-0.0469), tensor(-0.3552)] | [tensor(0.0167), tensor(0.3685), tensor(-0.0540), tensor(-0.6622)] | tensor(140527143417168) | tensor(False) |
4 | [tensor(0.0132), tensor(0.1727), tensor(-0.0469), tensor(-0.3552)] | [tensor(0.0167), tensor(0.3685), tensor(-0.0540), tensor(-0.6622)] | tensor(140527143418320) | tensor(False) |
5 | [tensor(0.0132), tensor(0.1727), tensor(-0.0469), tensor(-0.3552)] | [tensor(0.0167), tensor(0.3685), tensor(-0.0540), tensor(-0.6622)] | tensor(140527143418960) | tensor(False) |
6 | [tensor(0.0241), tensor(0.5643), tensor(-0.0672), tensor(-0.9714)] | [tensor(0.0353), tensor(0.3702), tensor(-0.0866), tensor(-0.7006)] | tensor(140527143417168) | tensor(False) |
7 | [tensor(0.0241), tensor(0.5643), tensor(-0.0672), tensor(-0.9714)] | [tensor(0.0353), tensor(0.3702), tensor(-0.0866), tensor(-0.7006)] | tensor(140527143418320) | tensor(False) |
8 | [tensor(0.0241), tensor(0.5643), tensor(-0.0672), tensor(-0.9714)] | [tensor(0.0353), tensor(0.3702), tensor(-0.0866), tensor(-0.7006)] | tensor(140527143418960) | tensor(False) |
9 | [tensor(0.0427), tensor(0.1763), tensor(-0.1007), tensor(-0.4364)] | [tensor(0.0463), tensor(-0.0172), tensor(-0.1094), tensor(-0.1771)] | tensor(140527143417168) | tensor(False) |
Here is a simple 1-env result…
= n_skip_test(['CartPole-v1']*1,200,2,0)
steps 'state','next_state','step_n','terminated']][:10] pd.DataFrame(steps)[[
state | next_state | step_n | terminated | |
---|---|---|---|---|
0 | [tensor(0.0137), tensor(-0.0230), tensor(-0.0459), tensor(-0.0483)] | [tensor(0.0132), tensor(0.1727), tensor(-0.0469), tensor(-0.3552)] | tensor(1) | tensor(False) |
1 | [tensor(0.0132), tensor(0.1727), tensor(-0.0469), tensor(-0.3552)] | [tensor(0.0167), tensor(0.3685), tensor(-0.0540), tensor(-0.6622)] | tensor(2) | tensor(False) |
2 | [tensor(0.0241), tensor(0.5643), tensor(-0.0672), tensor(-0.9714)] | [tensor(0.0353), tensor(0.3702), tensor(-0.0866), tensor(-0.7006)] | tensor(4) | tensor(False) |
3 | [tensor(0.0427), tensor(0.1763), tensor(-0.1007), tensor(-0.4364)] | [tensor(0.0463), tensor(-0.0172), tensor(-0.1094), tensor(-0.1771)] | tensor(6) | tensor(False) |
4 | [tensor(0.0459), tensor(-0.2106), tensor(-0.1129), tensor(0.0792)] | [tensor(0.0417), tensor(-0.4040), tensor(-0.1113), tensor(0.3342)] | tensor(8) | tensor(False) |
5 | [tensor(0.0336), tensor(-0.5973), tensor(-0.1047), tensor(0.5899)] | [tensor(0.0217), tensor(-0.4009), tensor(-0.0929), tensor(0.2661)] | tensor(10) | tensor(False) |
6 | [tensor(0.0137), tensor(-0.2046), tensor(-0.0875), tensor(-0.0543)] | [tensor(0.0096), tensor(-0.0083), tensor(-0.0886), tensor(-0.3733)] | tensor(12) | tensor(False) |
7 | [tensor(0.0094), tensor(0.1879), tensor(-0.0961), tensor(-0.6926)] | [tensor(0.0132), tensor(0.3842), tensor(-0.1099), tensor(-1.0139)] | tensor(14) | tensor(False) |
8 | [tensor(0.0209), tensor(0.5806), tensor(-0.1302), tensor(-1.3390)] | [tensor(0.0325), tensor(0.7771), tensor(-0.1570), tensor(-1.6694)] | tensor(16) | tensor(False) |
9 | [tensor(0.0480), tensor(0.9737), tensor(-0.1904), tensor(-2.0066)] | [tensor(0.0675), tensor(1.1702), tensor(-0.2305), tensor(-2.3517)] | tensor(18) | tensor(True) |
n_skips_expected
n_skips_expected (default_steps:int, n:int)
Produces the expected number of steps, assuming a fully deterministic episode based on default_steps
and n
.
Mainly used for testing.
Given n=2
, given 1 envs, knowing that CartPole-v1
when seed=0
will always run 18 steps, the total steps will be:
\[ 18 // n + 1 (1st+last) \]
Type | Details | |
---|---|---|
default_steps | int | The number of steps the episode would run without n_skips |
n | int | The n-skip value that we are planning to use |