Hierarchical Actor Critic

Actions space: Continuous

References: Hierarchical Reinforcement Learning with Hindsight

Network Structure

../../../_images/ddpg.png

Algorithm Description

Choosing an action

Pass the current states through the actor network, and get an action mean vector \(\mu\). While in training phase, use a continuous exploration policy, such as the Ornstein-Uhlenbeck process, to add exploration noise to the action. When testing, use the mean vector \(\mu\) as-is.

Training the network