MuJoCo Inverted Double Pendulum Environment

Overview

This is a harder version of InvertedPendulum, where the pole has another pole on top of it. The agent’s goal is to balance a pole on a pole on a cart.

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Result	Algorithm	Source
9356.1	A2C	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
9356.0	ACKTR	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
9355.52	DDPG	Addressing Function Approximation Error in Actor-Critic Methods
9337.47	TD3	Addressing Function Approximation Error in Actor-Critic Methods
9320.0	TRPO	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
9081.92	ACKTR	Addressing Function Approximation Error in Actor-Critic Methods
8977.94	PPO	Addressing Function Approximation Error in Actor-Critic Methods
8487.15	SAC	Addressing Function Approximation Error in Actor-Critic Methods
8369.95	Our DDPG	Addressing Function Approximation Error in Actor-Critic Methods
7102.91	PPO	OpenAI Baselines ea68f3b
6731.63	TRPO (MPI)	OpenAI Baselines ea68f3b
205.85	TRPO	Addressing Function Approximation Error in Actor-Critic Methods

endtoend.ai

MuJoCo Inverted Double Pendulum Environment

Overview

Performances of RL Agents

Explore →