Welcome to Pommerman! Pommerman is one of NIPS 2018 Competition tracks, where the participants seek to build agents to compete against other agents in a game of Bomberman.
In this post, we simply explain the basics of Pommerman, leaving reinforcement learning to later posts.
Every game of Pommerman starts in a 11-by-11 grid. There are four agents, one at each corner. Every agent starts with one bomb and a blast strength of 3. The agent can increase these numbers and can kick the bomb by acquiring power-ups.
The game ends when only players from one team remains. It is possible that the game does not end until max timestep, or both team’s last agents are destroyed in the same turn. Then, the game is rerun two more times. Upon three consecutive ties, the game is rerun with “collapsing walls” until a winner is decided. (We will go over more details about collapsing walls later, but imagine a map that becomes smaller as time goes.)
There are three variants of Pommerman:
- Free for All (FFA): One agent each from 4 participants compete against each other. The board is fully observable.
- Team: Two agent each from 2 participants compete against each other. The board is partially observable for each agent. Each agent must be completely separate and cannot share any information.
- Team Radio: Same as Team, but the agent is allowed to send short “words” to another agent to share information.
In the NIPS 2018 Competition, the Team version is used, where each participant submits two agents and plays against two agents from another participant. The agents can be different, employing different strategies.
For those who wonder why Team Radio was not chosen for NIPS 2018, it was because the partial observability and multi-agent aspects added in Team made the competition complex enough. (For example, AlphaGo Zero cannot be used with POMDPs without modification) The communication is too difficult within a huge environment.
Each square is either a passage, contains a wall, a bomb, a power-up, or an agent, or is a fogged-out area.
Walls block the agent’s movement. There are two types of walls: wood wall and rigid wall. Wood walls are walls that can be broken by bombs that might have power-up item hidden behind it. There are 48 randomly placed wood walls in each map. Rigid walls cannot be broken.
Bombs are placed by agent and explodes after 10 timesteps. When the bomb explodes, it destroys any wooden walls, agents, or power-ups and triggers other bombs in the range. Bombs can be kicked by agents that has the Can Kick power-up.
Power-ups are behind wooden walls and are revealed when the wooden wall is destroyed. There are 20 power-ups hidden under 48 wood walls, so each wall has approximately 40% chance of an item.
For more information, check Agent Observations.
The environment is partially observable, hence the name Partially Observable Mmerman. The agent can look up to 4 units in each direction (Source). Everywhere outside this range is observed as a fog by the agent.
To make sure that the agent isn’t killed by a bomb that it does not see, the bomb’s max blast strength is restricted by the agent’s view size (Source).
pommerman package is not uploaded in a package index like PyPI or Anaconda
Cloud, so it must be installed by cloning the repository.
git clone https://github.com/MultiAgentLearning/playground cd playground pip install -U .
playground repository not only contains the
pommerman package, but also
useful scripts and notebooks.
For the full installation guide, check Getting Started.
Resources for Pommerman is dispersed. These are all the places I have checked:
If there are any discrepancies, I highly recommend asking questions at Discord. The organizers and other participants are always very helpful!