NSkip

DataPipe for skipping env steps env-wise.

NSkipper

 NSkipper (*args, **kwds)

Accepts a source_datapipe or iterable whose next() produces a StepType that skips N steps for individual environments while always producing 1st steps and terminated steps.

Below we skip every other step given 3 envs while always keeping the 1st and terminated steps.

import pandas as pd
from fastrl.envs.gym import GymTypeTransform,GymStepper
from fastrl.pipes.map.transforms import TypeTransformer
def n_skip_test(envs,total_steps,n=1,seed=0):
    pipe = dp.map.Mapper(envs)
    pipe = TypeTransformer(pipe,[GymTypeTransform])
    pipe = dp.iter.MapToIterConverter(pipe)
    pipe = dp.iter.InMemoryCacheHolder(pipe)
    pipe = pipe.cycle()
    pipe = GymStepper(pipe,seed=seed)
    pipe = NSkipper(pipe,n=n)

    steps = [step for step,_ in zip(*(pipe,range(total_steps)))]
    return steps

steps = n_skip_test(['CartPole-v1']*3,200,2,0)
pd.DataFrame(steps)[['state','next_state','env_id','terminated']][:10]
state next_state env_id terminated
0 [tensor(0.0137), tensor(-0.0230), tensor(-0.0459), tensor(-0.0483)] [tensor(0.0132), tensor(0.1727), tensor(-0.0469), tensor(-0.3552)] tensor(140527143417168) tensor(False)
1 [tensor(0.0137), tensor(-0.0230), tensor(-0.0459), tensor(-0.0483)] [tensor(0.0132), tensor(0.1727), tensor(-0.0469), tensor(-0.3552)] tensor(140527143418320) tensor(False)
2 [tensor(0.0137), tensor(-0.0230), tensor(-0.0459), tensor(-0.0483)] [tensor(0.0132), tensor(0.1727), tensor(-0.0469), tensor(-0.3552)] tensor(140527143418960) tensor(False)
3 [tensor(0.0132), tensor(0.1727), tensor(-0.0469), tensor(-0.3552)] [tensor(0.0167), tensor(0.3685), tensor(-0.0540), tensor(-0.6622)] tensor(140527143417168) tensor(False)
4 [tensor(0.0132), tensor(0.1727), tensor(-0.0469), tensor(-0.3552)] [tensor(0.0167), tensor(0.3685), tensor(-0.0540), tensor(-0.6622)] tensor(140527143418320) tensor(False)
5 [tensor(0.0132), tensor(0.1727), tensor(-0.0469), tensor(-0.3552)] [tensor(0.0167), tensor(0.3685), tensor(-0.0540), tensor(-0.6622)] tensor(140527143418960) tensor(False)
6 [tensor(0.0241), tensor(0.5643), tensor(-0.0672), tensor(-0.9714)] [tensor(0.0353), tensor(0.3702), tensor(-0.0866), tensor(-0.7006)] tensor(140527143417168) tensor(False)
7 [tensor(0.0241), tensor(0.5643), tensor(-0.0672), tensor(-0.9714)] [tensor(0.0353), tensor(0.3702), tensor(-0.0866), tensor(-0.7006)] tensor(140527143418320) tensor(False)
8 [tensor(0.0241), tensor(0.5643), tensor(-0.0672), tensor(-0.9714)] [tensor(0.0353), tensor(0.3702), tensor(-0.0866), tensor(-0.7006)] tensor(140527143418960) tensor(False)
9 [tensor(0.0427), tensor(0.1763), tensor(-0.1007), tensor(-0.4364)] [tensor(0.0463), tensor(-0.0172), tensor(-0.1094), tensor(-0.1771)] tensor(140527143417168) tensor(False)

Here is a simple 1-env result…

steps = n_skip_test(['CartPole-v1']*1,200,2,0)
pd.DataFrame(steps)[['state','next_state','step_n','terminated']][:10]
state next_state step_n terminated
0 [tensor(0.0137), tensor(-0.0230), tensor(-0.0459), tensor(-0.0483)] [tensor(0.0132), tensor(0.1727), tensor(-0.0469), tensor(-0.3552)] tensor(1) tensor(False)
1 [tensor(0.0132), tensor(0.1727), tensor(-0.0469), tensor(-0.3552)] [tensor(0.0167), tensor(0.3685), tensor(-0.0540), tensor(-0.6622)] tensor(2) tensor(False)
2 [tensor(0.0241), tensor(0.5643), tensor(-0.0672), tensor(-0.9714)] [tensor(0.0353), tensor(0.3702), tensor(-0.0866), tensor(-0.7006)] tensor(4) tensor(False)
3 [tensor(0.0427), tensor(0.1763), tensor(-0.1007), tensor(-0.4364)] [tensor(0.0463), tensor(-0.0172), tensor(-0.1094), tensor(-0.1771)] tensor(6) tensor(False)
4 [tensor(0.0459), tensor(-0.2106), tensor(-0.1129), tensor(0.0792)] [tensor(0.0417), tensor(-0.4040), tensor(-0.1113), tensor(0.3342)] tensor(8) tensor(False)
5 [tensor(0.0336), tensor(-0.5973), tensor(-0.1047), tensor(0.5899)] [tensor(0.0217), tensor(-0.4009), tensor(-0.0929), tensor(0.2661)] tensor(10) tensor(False)
6 [tensor(0.0137), tensor(-0.2046), tensor(-0.0875), tensor(-0.0543)] [tensor(0.0096), tensor(-0.0083), tensor(-0.0886), tensor(-0.3733)] tensor(12) tensor(False)
7 [tensor(0.0094), tensor(0.1879), tensor(-0.0961), tensor(-0.6926)] [tensor(0.0132), tensor(0.3842), tensor(-0.1099), tensor(-1.0139)] tensor(14) tensor(False)
8 [tensor(0.0209), tensor(0.5806), tensor(-0.1302), tensor(-1.3390)] [tensor(0.0325), tensor(0.7771), tensor(-0.1570), tensor(-1.6694)] tensor(16) tensor(False)
9 [tensor(0.0480), tensor(0.9737), tensor(-0.1904), tensor(-2.0066)] [tensor(0.0675), tensor(1.1702), tensor(-0.2305), tensor(-2.3517)] tensor(18) tensor(True)

n_skips_expected

 n_skips_expected (default_steps:int, n:int)

Produces the expected number of steps, assuming a fully deterministic episode based on default_steps and n.

Mainly used for testing.

Given n=2, given 1 envs, knowing that CartPole-v1 when seed=0 will always run 18 steps, the total steps will be:

\[ 18 // n + 1 (1st+last) \]

Type Details
default_steps int The number of steps the episode would run without n_skips
n int The n-skip value that we are planning to use