PyTorch Implementations of Policy Gradient Methods

Published Dec 28, 2018 by Seungjae Ryan Lee

The key to fast iterations of research experiments are well-written baseline algorithms. Unfortunately, most big research companies write their code in TensorFlow, (openai/baselines, openai/spinningup, deepmind/trfl, google/dopamine) so the PyTorch implementations are less well known. To help PyTorch deep RL researchers, we compare and recommend open source implementations of policy gradient algorithms in PyTorch.

Note that due to the big difference between PyTorch 0.3 and 0.4, we only include repositories with PyTorch versions 0.4 or above.

Policy Gradient Methods

A3C

Asynchronous Advantage Actor Critic

[arXiv Paper]

	pytorch-a3c
Author	ikostrikov
Version	0.4.1
Pretrained Models	✘
Stars	519

A2C

Advantage Actor Critic

	pytorch-a2c-ppo-acktr	RL-Adventure-2	vel	DeepRL
Author	ikostrikov	higgsfield	MillionIntegrals	ShangtongZhang
Version	0.4	0.4	0.4.1	0.4.0
Pretrained Models	✔	✘	✘	✘
Stars	1077	1521	194	1034

ACER

Actor Critic with Experience Replay

[arXiv Paper]

	ACER	RL-Adventure-2	vel
Author	Kaixhin	higgsfield	MillionIntegrals
Version	0.4	0.4	0.4.1
Pretrained Models	✘	✘	✘
Stars	138	1521	194

ACKTR

Actor Critic using Kronecker-Factored Trust Region

[arXiv Paper]

	pytorch-a2c-ppo-acktr
Author	ikostrikov
Version	0.4
Pretrained Models	✔
Stars	1077

TRPO

Trust Region Policy Optimization

[arXiv Paper]

	pytorch-trpo	vel
Author	ikostrikov	MillionIntegrals
Version	0.4	0.4.1
Pretrained Models	✘	✘
Stars	170	194

PPO

Proximal Policy Optimization

[arXiv Paper]

	pytorch-a2c-ppo-acktr	RL-Adventure-2	vel	DeepRL
Author	ikostrikov	higgsfield	MillionIntegrals	ShangtongZhang
Version	0.4	0.4	0.4.1	0.4.0
Pretrained Models	✔	✘	✘	✘
Stars	1077	1521	194	1034

SAC

Soft Actor-Critic

[arXiv Paper]

	rlkit	RL-Adventure-2
Author	vitchyr	higgsfield
Version	0.4	0.4
Pretrained Models	✘	✘
Stars	491	1521

Twin SAC

Combination of SAC and TD3

	rlkit
Author	vitchyr
Version	0.4
Pretrained Models	✘
Stars	491

Recommendation

Although vitchyr/rlkit has SAC and Twin SAC, which are state-of-the-art methods in robotic control, it unfortunately does not include PPO, the standard baseline policy gradient algorithm. We found ikostrikov/pytorch-a2c-ppo-acktr and ShangtongZhang/DeepRL to be the best implementation of PPO, allowing us to run code almost immediately after cloning the repository. We gave bonus points to this repository because it also included some pretrained models.

Verdict: ikostrikov/pytorch-a2c-ppo-acktr

Deterministic Policy Gradient Methods

DDPG

Deep Deterministic Policy Gradient

[arXiv Paper]

	rlkit	pytorch-ddpg-naf	RL-Adventure-2	vel	DeepRL
Author	vitchyr	ikostrikov	higgsfield	MillionIntegrals	ShangtongZhang
Version	0.4	0.4	0.4	0.4.1	0.4.0
Pretrained Models	✘	✘	✘	✘	✘
Stars	491	136	1521	194	1034

TD3

Twin-Delayed Deep Deterministic Policy Gradient

[arXiv Paper]

	rlkit	RL-Adventure-2
Author	vitchyr	higgsfield
Version	0.4	0.4
Pretrained Models	✘	✘
Stars	491	1521

HER+TD3

Hindsight Experience Replay + Twin-Delayed Deep Deterministic Policy Gradient

[arXiv Paper]

	rlkit
Author	vitchyr
Version	0.4
Pretrained Models	✘
Stars	491

Recommendation

For Deterministic Policy Gradient methods, vitchyr/rlkit and higgsfield/RL-Adventure-2 were the only repositories with both DDPG and TD3 implemented. We found higgsfield/RL-Adventure-2 to be more suitable for understanding the algorithm than running it, so we recommend using vitchyr/rlkit as your baseline.

Verdict: vitchyr/rlkit

Checked Repositories

vitchyr/rlkit
ikostrikov/pytorch-a3c
ikostrikov/pytorch-trpo
ikostrikov/pytorch-a2c-ppo-acktr
ikostrikov/pytorch-ddpg-naf
Kaixhin/ACER
higgsfield/RL-Adventure-2
MillionIntegrals/vel
ShangtongZhang/DeepRL (Added 2018/12/29)

If you believe we missed a great PyTorch RL repository, please tell us in the comment section!

endtoend.ai

PyTorch Implementations of Policy Gradient Methods

Policy Gradient Methods

A3C

A2C

ACER

ACKTR

TRPO

PPO

SAC

Twin SAC

Recommendation

Deterministic Policy Gradient Methods

DDPG

TD3

HER+TD3

Recommendation

Checked Repositories

Explore →