MuJoCo Inverted Double Pendulum Environment

Overview

This is a harder version of InvertedPendulum, where the pole has another pole on top of it. The agent’s goal is to balance a pole on a pole on a cart.

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Result Algorithm Source
9356.1 A2C Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
9356.0 ACKTR Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
9355.52 DDPG Addressing Function Approximation Error in Actor-Critic Methods
9337.47 TD3 Addressing Function Approximation Error in Actor-Critic Methods
9320.0 TRPO Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
9081.92 ACKTR Addressing Function Approximation Error in Actor-Critic Methods
8977.94 PPO Addressing Function Approximation Error in Actor-Critic Methods
8487.15 SAC Addressing Function Approximation Error in Actor-Critic Methods
8369.95 Our DDPG Addressing Function Approximation Error in Actor-Critic Methods
7102.91 PPO OpenAI Baselines ea68f3b
6731.63 TRPO (MPI) OpenAI Baselines ea68f3b
205.85 TRPO Addressing Function Approximation Error in Actor-Critic Methods