Overview

MuJoCo (Multi-Joint dynamics with Contact) is a proprietary physics engine for detailed, efficient rigid body simulations with contacts. MuJoCo can be used to create environments with continuous control tasks such as walking or running. Thus, many policy gradient methods (TRPO, PPO) have been tested on various MuJoCo environments.

Environments

OpenAI Gym has 10 MuJoCo environments available, ranging from simple tasks such as inverted pendulums (CartPole) to humanoids.

InvertedPendulum

This is a MuJoCo version of CartPole. The agent’s goal is to balance a pole on a cart.

InvertedDoublePendulum

This is a harder version of InvertedPendulum, where the pole has another pole on top of it. The agent’s goal is to balance a pole on a pole on a cart.

Reacher

Make a 2D robot reach to a randomly located target.

Hopper

Make a two-dimensional one-legged robot hop forward as fast as possible.

Swimmer

Make a 2D robot swim.

Walker2d

Make a two-dimensional bipedal robot walk forward as fast as possible.

Ant

Make a four-legged creature walk forward as fast as possible.

HalfCheetah

Make a 2D cheetah robot run.

Humanoid

Make a three-dimensional bipedal robot walk forward as fast as possible, without falling over.

HumanoidStandup

Make a three-dimensional bipedal robot standup as fast as possible.

State of the Art

There are many papers that have experimented with the MuJoCo continuous control environment, but most papers decided not include exact scores and instead used performance curves. Thus, all results were taken from Deep Reinforcement Learning that Matters, a paper on reproducing state-of-the-art policy gradient methods.

If you know other papers that report results on the MuJoCo environment, please email me!

HalfCheetah-v1

Bootstrap Mean 95% Confidence Bounds Algorithm
5037.26 (3664.11, 6574.01) DDPG
3888.85 (2288.13, 5131.96) ACKTR
3043.1 (1920.4, 4165.86) PPO
1254.55 (999.52, 1464.86) TRPO

Hopper-v1

Bootstrap Mean 95% Confidence Bounds Algorithm
2965.33 (2854.66, 3076.00) TRPO
2715.72 (2589.06, 2847.93) PPO
2546.89 (1875.79, 3217.98) ACKTR
1632.13 (607.98, 2370.21) DDPG

Walker2d-v1

Bootstrap Mean 95% Confidence Bounds Algorithm
3072.97 (2957.94, 3183.10) TRPO
2926.92 (2514.83, 3361.43) PPO
2285.49 (1246.00, 3235.96) ACKTR
1582.04 (901.66, 2174.66) DDPG

Swimmer-v1

Bootstrap Mean 95% Confidence Bounds Algorithm
214.69 (141.52, 287.92) TRPO
107.88 (101.13, 118.56) PPO
50.22 (42.47, 55.37) ACKTR
31.92 (21.68, 46.23) DDPG

Installation

Prerequisites

To install the MuJoCo environment, you need the OpenAI Gym toolkit. Read this page to learn how to install OpenAI Gym.

You also need to purchase MuJoCo license. MuJoCo offers a 30-day trial license for everyone, and a free license for students using MuJoCo for personal projects only. Visit their license page for more information.

Install MuJoCo binary

  1. Download the MuJoCo version 1.50 binaries for Linux, OSX, or Windows.
  2. Unzip the downloaded mjpro150 directory into ~/.mujoco/mjpro150, and place your license key (the mjkey.txt file from your email) at ~/.mujoco/mjkey.txt.

Install MuJoCo package

If you did a full install of OpenAI Gym, the MuJoCo package should already be installed. Otherwise, you can install the MuJoCo environments with a single pip command:

pip3 install gym[mujoco]

Test Installation

You can try rendering the Humanoid-v2 environment to make sure the MuJoCo environment was correctly installed.

import gym
env = gym.make('Humanoid-v2')
env.reset()
env.render()