Memories¶
Episodic Memories¶
EpisodicExperienceReplay¶
-
class
rl_coach.memories.episodic.
EpisodicExperienceReplay
(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int] = (<MemoryGranularity.Transitions: 0>, 1000000), n_step=-1, train_to_eval_ratio: int = 1)[source]¶ A replay buffer that stores episodes of transitions. The additional structure allows performing various calculations of total return and other values that depend on the sequential behavior of the transitions in the episode.
- Parameters
max_size – the maximum number of transitions or episodes to hold in the memory
EpisodicHindsightExperienceReplay¶
-
class
rl_coach.memories.episodic.
EpisodicHindsightExperienceReplay
(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], hindsight_transitions_per_regular_transition: int, hindsight_goal_selection_method: rl_coach.memories.episodic.episodic_hindsight_experience_replay.HindsightGoalSelectionMethod, goals_space: rl_coach.spaces.GoalsSpace)[source]¶ Implements Hindsight Experience Replay as described in the following paper: https://arxiv.org/pdf/1707.01495.pdf
- Parameters
max_size – The maximum size of the memory. should be defined in a granularity of Transitions
hindsight_transitions_per_regular_transition – The number of hindsight artificial transitions to generate for each actual transition
hindsight_goal_selection_method – The method that will be used for generating the goals for the hindsight transitions. Should be one of HindsightGoalSelectionMethod
goals_space – A GoalsSpace which defines the base properties of the goals space
EpisodicHRLHindsightExperienceReplay¶
-
class
rl_coach.memories.episodic.
EpisodicHRLHindsightExperienceReplay
(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], hindsight_transitions_per_regular_transition: int, hindsight_goal_selection_method: rl_coach.memories.episodic.episodic_hindsight_experience_replay.HindsightGoalSelectionMethod, goals_space: rl_coach.spaces.GoalsSpace)[source]¶ Implements HRL Hindsight Experience Replay as described in the following paper: https://arxiv.org/abs/1805.08180
This is the memory you should use if you want a shared hindsight experience replay buffer between multiple workers
- Parameters
max_size – The maximum size of the memory. should be defined in a granularity of Transitions
hindsight_transitions_per_regular_transition – The number of hindsight artificial transitions to generate for each actual transition
hindsight_goal_selection_method – The method that will be used for generating the goals for the hindsight transitions. Should be one of HindsightGoalSelectionMethod
goals_space – A GoalsSpace which defines the properties of the goals
do_action_hindsight – Replace the action (sub-goal) given to a lower layer, with the actual achieved goal
Non-Episodic Memories¶
BalancedExperienceReplay¶
-
class
rl_coach.memories.non_episodic.
BalancedExperienceReplay
(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], allow_duplicates_in_batch_sampling: bool = True, num_classes: int = 0, state_key_with_the_class_index: Any = 'class')[source]¶ - Parameters
max_size – the maximum number of transitions or episodes to hold in the memory
allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch
num_classes – the number of classes in the replayed data
state_key_with_the_class_index – the class index is assumed to be a value in the state dictionary. this parameter determines the key to retrieve the class index value
QDND¶
ExperienceReplay¶
-
class
rl_coach.memories.non_episodic.
ExperienceReplay
(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], allow_duplicates_in_batch_sampling: bool = True)[source]¶ A regular replay buffer which stores transition without any additional structure
- Parameters
max_size – the maximum number of transitions or episodes to hold in the memory
allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch
PrioritizedExperienceReplay¶
-
class
rl_coach.memories.non_episodic.
PrioritizedExperienceReplay
(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], alpha: float = 0.6, beta: rl_coach.schedules.Schedule = <rl_coach.schedules.ConstantSchedule object>, epsilon: float = 1e-06, allow_duplicates_in_batch_sampling: bool = True)[source]¶ This is the proportional sampling variant of the prioritized experience replay as described in https://arxiv.org/pdf/1511.05952.pdf.
- Parameters
max_size – the maximum number of transitions or episodes to hold in the memory
alpha – the alpha prioritization coefficient
beta – the beta parameter used for importance sampling
epsilon – a small value added to the priority of each transition
allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch