Hierarchical Actor Critic

Actions space: Continuous

References: Hierarchical Reinforcement Learning with Hindsight

Network Structure


Algorithm Description

Choosing an action

Pass the current states through the actor network, and get an action mean vector \(\mu\). While in training phase, use a continuous exploration policy, such as the Ornstein-Uhlenbeck process, to add exploration noise to the action. When testing, use the mean vector \(\mu\) as-is.

Training the network