Memories¶

Episodic Memories¶

EpisodicExperienceReplay¶

class rl_coach.memories.episodic.EpisodicExperienceReplay(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int] = (<MemoryGranularity.Transitions: 0>, 1000000), n_step=-1, train_to_eval_ratio: int = 1)[source]¶

A replay buffer that stores episodes of transitions. The additional structure allows performing various calculations of total return and other values that depend on the sequential behavior of the transitions in the episode.

Parameters: max_size – the maximum number of transitions or episodes to hold in the memory

EpisodicHindsightExperienceReplay¶

class rl_coach.memories.episodic.EpisodicHindsightExperienceReplay(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], hindsight_transitions_per_regular_transition: int, hindsight_goal_selection_method: rl_coach.memories.episodic.episodic_hindsight_experience_replay.HindsightGoalSelectionMethod, goals_space: rl_coach.spaces.GoalsSpace)[source]¶

Implements Hindsight Experience Replay as described in the following paper: https://arxiv.org/pdf/1707.01495.pdf

Parameters

max_size – The maximum size of the memory. should be defined in a granularity of Transitions
hindsight_transitions_per_regular_transition – The number of hindsight artificial transitions to generate for each actual transition
hindsight_goal_selection_method – The method that will be used for generating the goals for the hindsight transitions. Should be one of HindsightGoalSelectionMethod
goals_space – A GoalsSpace which defines the base properties of the goals space

EpisodicHRLHindsightExperienceReplay¶

class rl_coach.memories.episodic.EpisodicHRLHindsightExperienceReplay(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], hindsight_transitions_per_regular_transition: int, hindsight_goal_selection_method: rl_coach.memories.episodic.episodic_hindsight_experience_replay.HindsightGoalSelectionMethod, goals_space: rl_coach.spaces.GoalsSpace)[source]¶

Implements HRL Hindsight Experience Replay as described in the following paper: https://arxiv.org/abs/1805.08180

This is the memory you should use if you want a shared hindsight experience replay buffer between multiple workers

Parameters

max_size – The maximum size of the memory. should be defined in a granularity of Transitions
hindsight_transitions_per_regular_transition – The number of hindsight artificial transitions to generate for each actual transition
hindsight_goal_selection_method – The method that will be used for generating the goals for the hindsight transitions. Should be one of HindsightGoalSelectionMethod
goals_space – A GoalsSpace which defines the properties of the goals
do_action_hindsight – Replace the action (sub-goal) given to a lower layer, with the actual achieved goal

SingleEpisodeBuffer¶

class rl_coach.memories.episodic.SingleEpisodeBuffer[source]¶

Non-Episodic Memories¶

BalancedExperienceReplay¶

class rl_coach.memories.non_episodic.BalancedExperienceReplay(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], allow_duplicates_in_batch_sampling: bool = True, num_classes: int = 0, state_key_with_the_class_index: Any = 'class')[source]¶

Parameters

max_size – the maximum number of transitions or episodes to hold in the memory
allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch
num_classes – the number of classes in the replayed data
state_key_with_the_class_index – the class index is assumed to be a value in the state dictionary. this parameter determines the key to retrieve the class index value

QDND¶

class rl_coach.memories.non_episodic.QDND(dict_size, key_width, num_actions, new_value_shift_coefficient=0.1, key_error_threshold=0.01, learning_rate=0.01, num_neighbors=50, return_additional_data=False, override_existing_keys=False, rebuild_on_every_update=False)[source]¶

ExperienceReplay¶

class rl_coach.memories.non_episodic.ExperienceReplay(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], allow_duplicates_in_batch_sampling: bool = True)[source]¶

A regular replay buffer which stores transition without any additional structure

Parameters

max_size – the maximum number of transitions or episodes to hold in the memory
allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch

PrioritizedExperienceReplay¶

class rl_coach.memories.non_episodic.PrioritizedExperienceReplay(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], alpha: float = 0.6, beta: rl_coach.schedules.Schedule = <rl_coach.schedules.ConstantSchedule object>, epsilon: float = 1e-06, allow_duplicates_in_batch_sampling: bool = True)[source]¶

This is the proportional sampling variant of the prioritized experience replay as described in https://arxiv.org/pdf/1511.05952.pdf.

Parameters

max_size – the maximum number of transitions or episodes to hold in the memory
alpha – the alpha prioritization coefficient
beta – the beta parameter used for importance sampling
epsilon – a small value added to the priority of each transition
allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch

TransitionCollection¶

class rl_coach.memories.non_episodic.TransitionCollection[source]¶: Simple python implementation of transitions collection non-episodic memories are constructed on top of.