Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess (Paper Explained)

#ai #chess #alphazero

Chess is a very old game and both its rules and theory have evolved over thousands of years in the collective effort of millions of humans. Therefore, it is almost impossible to predict the effect of even minor changes to the game rules, because this collective process cannot be easily replicated. This paper proposes to use AlphaZero’s ability to achieve superhuman performance in board games within one day of training to assess the effect of a series of small, but consequential rule changes. It analyzes the resulting strategies and sets the stage for broader applications of reinforcement learning to study rule-based systems.

OUTLINE:
0:00 – Intro & Overview
2:30 – Alternate Chess Rules
4:20 – Using AlphaZero to assess rule change outcomes
6:00 – How AlphaZero works
16:40 – Alternate Chess Rules continued
18:50 – Game outcome distributions
31:45 – e4 and Nf3 in classic vs no-castling chess
36:40 – Conclusions & comments

Paper:

My Video on AI Economist:

Abstract:
It is non-trivial to design engaging and balanced sets of game rules. Modern chess has evolved over centuries, but without a similar recourse to history, the consequences of rule changes to game dynamics are difficult to predict. AlphaZero provides an alternative in silico means of game balance assessment. It is a system that can learn near-optimal strategies for any rule set from scratch, without any human supervision, by continually learning from its own experience. In this study we use AlphaZero to creatively explore and design new chess variants. There is growing interest in chess variants like Fischer Random Chess, because of classical chess’s voluminous opening theory, the high percentage of draws in professional play, and the non-negligible number of games that end while both players are still in their home preparation. We compare nine other variants that involve atomic changes to the rules of chess. The changes allow for novel strategic and tactical patterns to emerge, while keeping the games close to the original. By learning near-optimal strategies for each variant with AlphaZero, we determine what games between strong human players might look like if these variants were adopted. Qualitatively, several variants are very dynamic. An analytic comparison show that pieces are valued differently between variants, and that some variants are more decisive than classical chess. Our findings demonstrate the rich possibilities that lie beyond the rules of modern chess.

Authors: Nenad Tomašev, Ulrich Paquet, Demis Hassabis, Vladimir Kramnik

Links:
YouTube:
Twitter:
Discord:
BitChute:
Minds:
Parler:
LinkedIn:

If you want to support me, the best thing to do is to share out the content 🙂

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar:
Patreon:
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

29 Comments

  1. Unfortunate to not see variants being evaluated that are decisive by design, e.g. the Armageddon rule where a draw = win for black.

  2. The Schmidhuber shade, legendary 😀

  3. I think you spent way too much time at the beginning of the video explaining how the search tree works.

  4. Regarding chess popularity, recently it has seen quite an explosion on the internet thanks to the PogChamps tournament, even considering that the game was played beyond human level way before neural networks were practical!

  5. How does the policy learn to suggest things to expand in the monty-carlo tree search?
    Would the varying number of possible moves per turn, mean that your need to pass the board in each state and run the network to get a singular prediction from the policy?

  6. It is pity the didn't test the variant that virtually eliminates draws – where if you get your king across the half way line without being in check you win.

  7. Not really my thing but thank you for your thoughts regardless. I appreciate the content even when it is not exactly my area of interest

  8. Yannic you are the Netflix of the paperworld 🍿🍿🍿

  9. I don't think the idea that AlphaZero might be meaningfully worse (or better) at some of these variants is particularly plausible. These are just so fundamentally similar to normal chess that it's hard to imagine the algorithm being specific to one, outside of some really niche hypotheses.

  10. I am wondering whether RL can be used to assess game rules and balance for quite a while and now someone did it 🙂

  11. Devansh: Machine Learning Made Simple says:

    Lots of Draws might be good. People Say white as an advantage. This proves that it's mostly even. Maybe that's just the draw is a win condition.

  12. They didn't analyse chess 960 or they could have probably come up with an opening arrangement leading to most not-draws and such..
    The sequence at 33:58 1. e4 e5 2. Nf3 Nc6 is worth memorizing then, if the opponent plays the same line or a similar one now that we know AlphaZero prefers it..Or if playing against an engine..Nice!

  13. 6-word summary:
    Better Beta Testing with AlphaZero

  14. Sorry but … it's muuuch too slow. If we're here, we know about chess, min max algorithm and reinforcement learning.
    You can skip everything before 17:00

  15. I would argue from the result the chess game is more decisive (in another sense) than video games as it makes it easier in chess to tell if two players are of the same strength. Hence winning in chess is more likely due to skill and not luck. So in practice, we can be more decisive of which player is stronger, or if both are equal, from just a single / few rounds, instead of needing multiple rounds to take out the random factor.

  16. Idea: use A0 to generate random initial conditions with equal probability of winning. Even better: roughly equal probability of winning (diff smaller than epsilon) while minimizing chance for a draw.

    Random initial conditions = position and amount of figures differ.

  17. If alphazero is "bad" at a chess variant compared to its skill in classical chess, that is still probably much stronger than a human could achieve in that variant. Also, Counter Strike is not solved by bots, at least in terms of strategy (of course aim bot breaks the game).

  18. Idea: evaluate W/D/L ratio for the best agents with slightly different skill (train time). Complaining about the W/D/L ratio for the same agend (as I understand) is a bit weird. If chess is an assesment of skill it would ideally converge to draw. To use it as a good metric for skill, it's important how often does only slightly bettter player W/D/L.

  19. An interesting paper, but I'd be more interested in an investigation into why AlphaZero's chess play seems so much more interesting and human-like. Computers have been able to beat world champions since the 90s – as a chess player, the thing about AlphaZero that was/is exciting is how it wins.

  20. I wish they had analyzed some popular variants that people actually play and that are known to reduce the number of draws dramatically, like 3-check (you can win by checkmate or by checking the enemy king 3 times) or king-of-the-hill (you can win by checkmate or by legally moving your king into one of the 4 central squares.

    Also, all of these variants can be combined with the setup from Chess960, which is interesting because it effectively eliminates the memorization of opening books (at least for human players; I guess computers can memorize 960 different books).

  21. This is how I explain the relation between decisiveness and the computation budget:

    In an extreme case each player is given almost no computation time. Being unable to think, they both make almost random moves. Thus white can't benefit from the little advantage of making the first move.

    The second case is to give both sides some computation time. Then the white can play strategically to benefit from the advantage of making the first move.

    The other extreme is to give both sides ample computation time. Then black can counterbalance the little disadvantage of being the second player by meticulous moves (of course the nature of the game allows him).

    It is interesting that too much computation time is in end effect like no computation time.

    The interesting case I think is to give black more computation time: How much more computation time can counterbalance the strategic disadvantage of being the second player?

  22. Although AlphaZero plays chess at super human level, it cannot SOLVE the game yet.

  23. I would like to see the average material difference by ply, by variant. This would indicate which games allow more positional advantage; i.e. since self-capture allows very dramatic self-capture combinations, I'd expect there to be larger material differences when for example W captures his own queen to set up a fork.

  24. The porblem is not so much the draw as it is that white has an advantage to start with and torpedo looks like is making the problem worse.

  25. I always admire people who say that Alphazero (or a similar software) can learn how to play in 1 day because simply it doesn't, at least not canonical computers we all have.
    We might want to consider how much computational power google can use in those famous 24h hours… In the AlphaZero case they were 5000 parallel custom TPU so… I wouldn't say that AlphaZero learns how to play in 24h without mentioning the context…
    So Alphazero can learn in 1 day but if you want to do it by yourself you need few thousands of years…

Leave a Reply

Your email address will not be published. Required fields are marked *