Additional Parameters¶

VisualizationParameters¶

class rl_coach.base_parameters.VisualizationParameters(print_networks_summary=False, dump_csv=True, dump_signals_to_csv_every_x_episodes=5, dump_gifs=False, dump_mp4=False, video_dump_methods=None, dump_in_episode_signals=False, dump_parameters_documentation=True, render=False, native_rendering=False, max_fps_for_human_control=10, tensorboard=False, add_rendered_image_to_env_response=False)[source]¶

Parameters

print_networks_summary – If set to True, a summary of all the networks structure will be printed at the beginning of the experiment
dump_csv – If set to True, the logger will dump logs to a csv file once in every dump_signals_to_csv_every_x_episodes episodes. The logs can be later used to visualize the training process using Coach Dashboard.
dump_signals_to_csv_every_x_episodes – Defines the number of episodes between writing new data to the csv log files. Lower values can affect performance, as writing to disk may take time, and it is done synchronously.
dump_gifs – If set to True, GIF videos of the environment will be stored into the experiment directory according to the filters defined in video_dump_methods.
dump_mp4 – If set to True, MP4 videos of the environment will be stored into the experiment directory according to the filters defined in video_dump_methods.
dump_in_episode_signals – If set to True, csv files will be dumped for each episode for inspecting different metrics within the episode. This means that for each step in each episode, different metrics such as the reward, the future return, etc. will be saved. Setting this to True may affect performance severely, and therefore this should be used only for debugging purposes.
dump_parameters_documentation – If set to True, a json file containing all the agent parameters will be saved in the experiment directory. This may be very useful for inspecting the values defined for each parameters and making sure that all the parameters are defined as expected.
render – If set to True, the environment render function will be called for each step, rendering the image of the environment. This may affect the performance of training, and is highly dependent on the environment. By default, Coach uses PyGame to render the environment image instead of the environment specific rendered. To change this, use the native_rendering flag.
native_rendering – If set to True, the environment native renderer will be used for rendering the environment image. In some cases this can be slower than rendering using PyGame through Coach, but in other cases the environment opens its native renderer by default, so rendering with PyGame is an unnecessary overhead.
max_fps_for_human_control – The maximum number of frames per second used while playing the environment as a human. This only has effect while using the –play flag for Coach.
tensorboard – If set to True, TensorBoard summaries will be stored in the experiment directory. This can later be loaded in TensorBoard in order to visualize the training process.
video_dump_methods – A list of dump methods that will be used as filters for deciding when to save videos. The filters in the list will be checked one after the other until the first dump method that returns false for should_dump() in the environment class. This list will only be used if dump_mp4 or dump_gif are set to True.
add_rendered_image_to_env_response – Some environments have a different observation compared to the one displayed while rendering. For some cases it can be useful to pass the rendered image to the agent for visualization purposes. If this flag is set to True, the rendered image will be added to the environment EnvResponse object, which will be passed to the agent and allow using those images.

PresetValidationParameters¶

class rl_coach.base_parameters.PresetValidationParameters(test=False, min_reward_threshold=0, max_episodes_to_achieve_reward=1, num_workers=1, reward_test_level=None, test_using_a_trace_test=True, trace_test_levels=None, trace_max_env_steps=5000, read_csv_tries=200)[source]¶

Parameters

test – A flag which specifies if the preset should be tested as part of the validation process.
min_reward_threshold – The minimum reward that the agent should pass after max_episodes_to_achieve_reward episodes when the preset is run.
max_episodes_to_achieve_reward – The maximum number of episodes that the agent should train using the preset in order to achieve the reward specified by min_reward_threshold.
num_workers – The number of workers that should be used when running this preset in the test suite for validation.
reward_test_level – The environment level or levels, given by a list of strings, that should be tested as part of the reward tests suite.
test_using_a_trace_test – A flag that specifies if the preset should be run as part of the trace tests suite.
trace_test_levels – The environment level or levels, given by a list of strings, that should be tested as part of the trace tests suite.
trace_max_env_steps – An integer representing the maximum number of environment steps to run when running this preset as part of the trace tests suite.
read_csv_tries – The number of retries to attempt for reading the experiment csv file, before declaring failure.

TaskParameters¶

class rl_coach.base_parameters.TaskParameters(framework_type: rl_coach.base_parameters.Frameworks = <Frameworks.tensorflow: 'TensorFlow'>, evaluate_only: int = None, use_cpu: bool = False, experiment_path='/tmp', seed=None, checkpoint_save_secs=None, checkpoint_restore_dir=None, checkpoint_restore_path=None, checkpoint_save_dir=None, export_onnx_graph: bool = False, apply_stop_condition: bool = False, num_gpu: int = 1)[source]¶

Parameters

framework_type – deep learning framework type. currently only tensorflow is supported
evaluate_only – if not None, the task will be used only for evaluating the model for the given number of steps. A value of 0 means that task will be evaluated for an infinite number of steps.
use_cpu – use the cpu for this task
experiment_path – the path to the directory which will store all the experiment outputs
seed – a seed to use for the random numbers generator
checkpoint_save_secs – the number of seconds between each checkpoint saving
checkpoint_restore_dir – [DEPECRATED - will be removed in one of the next releases - switch to checkpoint_restore_path] the dir to restore the checkpoints from
checkpoint_restore_path – the path to restore the checkpoints from
checkpoint_save_dir – the directory to store the checkpoints in
export_onnx_graph – If set to True, this will export an onnx graph each time a checkpoint is saved
apply_stop_condition – If set to True, this will apply the stop condition defined by reaching a target success rate
num_gpu – number of GPUs to use

DistributedTaskParameters¶

class rl_coach.base_parameters.DistributedTaskParameters(framework_type: rl_coach.base_parameters.Frameworks, parameters_server_hosts: str, worker_hosts: str, job_type: str, task_index: int, evaluate_only: int = None, num_tasks: int = None, num_training_tasks: int = None, use_cpu: bool = False, experiment_path=None, dnd=None, shared_memory_scratchpad=None, seed=None, checkpoint_save_secs=None, checkpoint_restore_path=None, checkpoint_save_dir=None, export_onnx_graph: bool = False, apply_stop_condition: bool = False)[source]¶

Parameters

framework_type – deep learning framework type. currently only tensorflow is supported
evaluate_only – if not None, the task will be used only for evaluating the model for the given number of steps. A value of 0 means that task will be evaluated for an infinite number of steps.
parameters_server_hosts – comma-separated list of hostname:port pairs to which the parameter servers are assigned
worker_hosts – comma-separated list of hostname:port pairs to which the workers are assigned
job_type – the job type - either ps (short for parameters server) or worker
task_index – the index of the process
num_tasks – the number of total tasks that are running (not including the parameters server)
num_training_tasks – the number of tasks that are training (not including the parameters server)
use_cpu – use the cpu for this task
experiment_path – the path to the directory which will store all the experiment outputs
dnd – an external DND to use for NEC. This is a workaround needed for a shared DND not using the scratchpad.
seed – a seed to use for the random numbers generator
checkpoint_save_secs – the number of seconds between each checkpoint saving
checkpoint_restore_path – the path to restore the checkpoints from
checkpoint_save_dir – the directory to store the checkpoints in
export_onnx_graph – If set to True, this will export an onnx graph each time a checkpoint is saved
apply_stop_condition – If set to True, this will apply the stop condition defined by reaching a target success rate