Memories

Episodic Memories

EpisodicExperienceReplay

class rl_coach.memories.episodic.EpisodicExperienceReplay(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int] = (<MemoryGranularity.Transitions: 0>, 1000000), n_step=-1, train_to_eval_ratio: int = 1)[source]

A replay buffer that stores episodes of transitions. The additional structure allows performing various calculations of total return and other values that depend on the sequential behavior of the transitions in the episode.

Parameters

max_size – the maximum number of transitions or episodes to hold in the memory

EpisodicHindsightExperienceReplay

class rl_coach.memories.episodic.EpisodicHindsightExperienceReplay(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], hindsight_transitions_per_regular_transition: int, hindsight_goal_selection_method: rl_coach.memories.episodic.episodic_hindsight_experience_replay.HindsightGoalSelectionMethod, goals_space: rl_coach.spaces.GoalsSpace)[source]

Implements Hindsight Experience Replay as described in the following paper: https://arxiv.org/pdf/1707.01495.pdf

Parameters
  • max_size – The maximum size of the memory. should be defined in a granularity of Transitions

  • hindsight_transitions_per_regular_transition – The number of hindsight artificial transitions to generate for each actual transition

  • hindsight_goal_selection_method – The method that will be used for generating the goals for the hindsight transitions. Should be one of HindsightGoalSelectionMethod

  • goals_space – A GoalsSpace which defines the base properties of the goals space

EpisodicHRLHindsightExperienceReplay

class rl_coach.memories.episodic.EpisodicHRLHindsightExperienceReplay(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], hindsight_transitions_per_regular_transition: int, hindsight_goal_selection_method: rl_coach.memories.episodic.episodic_hindsight_experience_replay.HindsightGoalSelectionMethod, goals_space: rl_coach.spaces.GoalsSpace)[source]

Implements HRL Hindsight Experience Replay as described in the following paper: https://arxiv.org/abs/1805.08180

This is the memory you should use if you want a shared hindsight experience replay buffer between multiple workers

Parameters
  • max_size – The maximum size of the memory. should be defined in a granularity of Transitions

  • hindsight_transitions_per_regular_transition – The number of hindsight artificial transitions to generate for each actual transition

  • hindsight_goal_selection_method – The method that will be used for generating the goals for the hindsight transitions. Should be one of HindsightGoalSelectionMethod

  • goals_space – A GoalsSpace which defines the properties of the goals

  • do_action_hindsight – Replace the action (sub-goal) given to a lower layer, with the actual achieved goal

SingleEpisodeBuffer

class rl_coach.memories.episodic.SingleEpisodeBuffer[source]

Non-Episodic Memories

BalancedExperienceReplay

class rl_coach.memories.non_episodic.BalancedExperienceReplay(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], allow_duplicates_in_batch_sampling: bool = True, num_classes: int = 0, state_key_with_the_class_index: Any = 'class')[source]
Parameters
  • max_size – the maximum number of transitions or episodes to hold in the memory

  • allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch

  • num_classes – the number of classes in the replayed data

  • state_key_with_the_class_index – the class index is assumed to be a value in the state dictionary. this parameter determines the key to retrieve the class index value

QDND

class rl_coach.memories.non_episodic.QDND(dict_size, key_width, num_actions, new_value_shift_coefficient=0.1, key_error_threshold=0.01, learning_rate=0.01, num_neighbors=50, return_additional_data=False, override_existing_keys=False, rebuild_on_every_update=False)[source]

ExperienceReplay

class rl_coach.memories.non_episodic.ExperienceReplay(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], allow_duplicates_in_batch_sampling: bool = True)[source]

A regular replay buffer which stores transition without any additional structure

Parameters
  • max_size – the maximum number of transitions or episodes to hold in the memory

  • allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch

PrioritizedExperienceReplay

class rl_coach.memories.non_episodic.PrioritizedExperienceReplay(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], alpha: float = 0.6, beta: rl_coach.schedules.Schedule = <rl_coach.schedules.ConstantSchedule object>, epsilon: float = 1e-06, allow_duplicates_in_batch_sampling: bool = True)[source]

This is the proportional sampling variant of the prioritized experience replay as described in https://arxiv.org/pdf/1511.05952.pdf.

Parameters
  • max_size – the maximum number of transitions or episodes to hold in the memory

  • alpha – the alpha prioritization coefficient

  • beta – the beta parameter used for importance sampling

  • epsilon – a small value added to the priority of each transition

  • allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch

TransitionCollection

class rl_coach.memories.non_episodic.TransitionCollection[source]

Simple python implementation of transitions collection non-episodic memories are constructed on top of.