Alpha catan: AI from the ground up

Building an AI

For the better part of the last year, I’ve been working on a little project I called alpha-catan. I wanted to work on something just fun and interesting. Nothing related to business or anything that is even (remotely) useful. Just pure unadulterated fun. The project I ended up working on is trying to build and AI that plays Catan.

Now, Settlers of Catan is a pretty famous board game which I have personally despised for ages. Apart from the terrible experiences of fights during Catan, I strongly believe there are 2 fundamental flaws of the game.

Economy acceleration through luck: There is a strong economy acceleration effect. Good rolls give you resources which gives you more settlements… giving you more chances for good rolls. The skill is where to put your settlements. This however creates a situation where in some games you become merely an observer waiting for the winner to slowly play out the game.
Trading: Many will argue the main game is the trading. From my experience, it is merely a tool to abuse new players. There is no reason really to trade as it is a zero-sum game. You would only accept a trade if you would increase your win% and a trade cannot do it for both parties of the trade. The only argument is if coming 2nd vs 3rd matters (which in general party games, it doesn’t).

These flaws are also why I believe a relatively simple model can beat the majority of players. If the only action that matters is mostly the first one, then it should be relatively simple to solve! Armed with the Hands on Machine learning book and my hatred of Catan, I started working on building the model and the controller that goes with it.

Objective: Beat average player on colonist.io

Building the model

On the outset, I wanted to do most things from scratch meaning I decided to not use a framework or any existing repos. I wanted to learn the basics of ML/RL and I got a lot of inspiration from the toy mnist problem with numpy and Karpathy’s pong writeup.

The model I wanted to build is a basic one. For input game state, predict a suitable action.

Inputs

Game state
- Players resources
- Players dev cards
- Board state

Prediction:

Action to perform

To model multiple actions per turn, we predict until the model sets pass. This assumes that each action is the same regardless of if it is part of a chain of actions or not.

Example: Turn

Action 1
Action 2
Action 3
until… Pass

To start off with and not wanting to scale up training a lot, I focused on a 2-player rendition of the game. It is simpler letting me focus on training to win.

Metrics

The main metric I looked at is turns to win. This helps ensure that they get generically better and better. The other are standard ML metrics avg loss (indicating how close action vs prediction) and avg rewards (the number of rewards for each player). The expectation is that turns to win trends low, avg loss trends low and avg rewards tampers to a steady level.

Reward allocation and the law of large numbers

Reward allocation is a tricky business and I struggled in finding a good one. It’s essentially coming up with an intuition and then trying to build for it. For instance, the earliest edition involved only rewarding winners. This is basically running experiments! The main difficulty comes from which action to reward. In this simple example below:

A → B → C → D → E → Win

… it’s unclear how much reward to allocate to each of the actions. The best reward is to reward for winning and let the model learn which actions are important. However, the longer the list of actions, the more fuzzy the allocation gets. In theory, based on the law of large numbers, it doesn’t matter (too much) but not having intermediate rewards drastically lower training speed. However, on the flip-side, overengineering the rewards would put too much human intuition to it and make it work via defined rules rather than learned rules.

Rewarding winners

The thought was the as you reward the winner’s actions, it will keep competing until it becomes the best. Here shows the metrics:

Rewarding winners, Penalizing losers

Other than that, we could also penalize losers. This will speed up

Rewarding Settlements and Cities

Settlements and cities are pretty useful. Let’s reward them

Rewarding points

A limitation of the settlements and cities approach is that it does not reward you if you get dev cards and or longest roads etc.

Rewarding points could be a good middle ground.

Training by evolution

An alternative to training on each player winning and losing is to train on the average winner over a batch. Although this could be slower (1/2 of rewards), it should be more accurate on which side is better, giving a better gradient.

Action space rebalancing

One thing that I noticed while training is that it overprioritised non-passing. A simple look at the action space and it’s became clear that because most actions have more than one action, passing is rarely randomly chosen. Rebalancing by a factor of the available actions looks to be a good idea. The idea here is to make passing initially the most possible action and let it slowly fall down.

Building the colonist.io controller

The colonist.io controller was much more difficult and time consuming than I thought. Because of a lack of apis to get the state of the game. The difficulty increases because colonist.io uses a html canvas to display the game. This means it’s not possible to search for elements. What I ended up having to building an image matcher in order to get the state of the game. This includes mapping each of the little settlements and city icons and use cv2 to find where it matches the most. Here is the list of all little images.

[]

There are essenetially many timeouts added to wait out animations. The images do not always match 100% making it also a bit unreliable.

The canvas itself is also not the only piece of the information puzzle. The messages also need to be parsed inorder to provide additional information on whether dev cards are, what was rolled and what resources was gained.

Combining the two

Using the state gathered by the controller, I pass it to our latest and greatest model to battle it out with the colonist.io AI.

Here’s a short video of this.

Over 100 games… we won…