Latest News

Congrats to nskiran (#1), rl_agent (#3), ymmoy999 (#4), Firework (#6), Yongjin (#8), HP (#9), and [email protected] (#10) for getting their new agents to Top 10 this week! Read more about the current leaderboard in the section below.

I misinterpreted the transition from Round 1 to Round 2. It has been clarified that Round 1 will continue until September 16th, and once it is over, Round 2 will be held until September 30th (tentative). Round 1 is the current round where the agent needs to follow a constant horizontal velocity vector of 3 m/s. In contrast, for Round 2, the target velocity vector will be time-dependent. More information should be available soon.

osim-rl-helper

I have refactored the agents and the environment wrappers to give a more consistent experience to those who might be using the osim-rl-helper package. If you are building upon the osim-rl-helper package, make sure you test your agents locally before you submit!

TensorforcePPOAgent

I created a new agent that uses Proximal Policy Optimization Algorithms (PPO) by Schulman et al. (2017) and the tensorforce package. Most top-performing agents in the last year’s Learning to Run contest used either the DDPG algorithm or the PPO algorithm, so try experimenting with both! The easiest experiments would be to make the neural network architecture more complex or to tune the hyperparameters. You can also use different optimizers (Adam, RMSprop, etc.).

What’s Next?

This post concludes the series of posts about the environment! I will now write about my analysis of the best agents in the last year’s competition. I also hope to add a better DDPG agent, since although the keras-rl package is intuitive, it is slow.

I plan to keep the new Leaderboard section so that people can see the general trend of the scores with a single chart.