What makes a position complex?

If you asked a Grandmaster what would they say? And is the position complex for humans, computers, or both? Do you know if a position is complex when you look at it?

Today, I am introducing an open-source tool that predicts the complexity of a chess position for a human. The tool is called Elocator, and I hope you find it interesting.

So how does it work?

I chose to define complexity as the expected change in Win % after a move is made. Imagine a position where white has a +1 advantage from Stockfish. That implies a 59% win rate for white. Assuming Stockfish is perfect, a human can only play a move that is as good or worse than Stockfish (i.e., white can not play a move that does increases the win rate for white). We know that after the next move is played, white will have a 59% or lower chance of winning.

Depending on the position a grandmaster may find the best move, or maybe it’s a really difficult position to find the best move. Over a large enough dataset we can make correlations between the state of the board and how much we expect the win % to go down after a move is made. As an example, over 20,000 moves, my data shows that a GM is expected to lose 1.4% win rate after a move is made in a position with a queen on the board, compared to 1.3% if there is no queen. That seems small, but also implies positions are about 7% more complex when there is a queen on the board (1.4/1.3).

I created a dataset of FENs mapped to the loss in Win % from a GM that made a move in that position (classical OTB games only). Underlying this tool is a neural network (AI, deep learning, yada yada) that has been trained on 100,000 chess moves made by grandmasters. The model has learned to predict the complexity of a position by learning the expected change in Win % after a move is made, as measured by Stockfish 16 at depth 20.

The model is then used to predict the complexity of a given position, 1-10. The model is not perfect, but it is a good starting point for understanding the complexity of a position. I look forward to making it better over time.

How accurate is the complexity score?

When building a model like this it’s good practice to use some of the data to build the model, and then set some data to the side. This let’s us evaluate the model on data that wasn’t used to build the model.

With roughly 100k moves in my dataset, I decided to keep 20,000 off to the side for evaluation, and I’ve plotted the model performance below.

Actual vs. Predicted by Complexity Score