Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess (Paper Explained)

#ai #chess #alphazero

Chess is a very old game and both its rules and theory have evolved over thousands of years in the collective effort of millions of humans. Therefore, it is almost impossible to predict the effect of even minor changes to the game rules, because this collective process cannot be easily replicated. This paper proposes to use AlphaZero’s ability to achieve superhuman performance in board games within one day of training to assess the effect of a series of small, but consequential rule changes. It analyzes the resulting strategies and sets the stage for broader applications of reinforcement learning to study rule-based systems.

OUTLINE:
0:00 – Intro & Overview
2:30 – Alternate Chess Rules
4:20 – Using AlphaZero to assess rule change outcomes
6:00 – How AlphaZero works
16:40 – Alternate Chess Rules continued
18:50 – Game outcome distributions
31:45 – e4 and Nf3 in classic vs no-castling chess
36:40 – Conclusions & comments

Paper:

My Video on AI Economist:

Abstract:
It is non-trivial to design engaging and balanced sets of game rules. Modern chess has evolved over centuries, but without a similar recourse to history, the consequences of rule changes to game dynamics are difficult to predict. AlphaZero provides an alternative in silico means of game balance assessment. It is a system that can learn near-optimal strategies for any rule set from scratch, without any human supervision, by continually learning from its own experience. In this study we use AlphaZero to creatively explore and design new chess variants. There is growing interest in chess variants like Fischer Random Chess, because of classical chess’s voluminous opening theory, the high percentage of draws in professional play, and the non-negligible number of games that end while both players are still in their home preparation. We compare nine other variants that involve atomic changes to the rules of chess. The changes allow for novel strategic and tactical patterns to emerge, while keeping the games close to the original. By learning near-optimal strategies for each variant with AlphaZero, we determine what games between strong human players might look like if these variants were adopted. Qualitatively, several variants are very dynamic. An analytic comparison show that pieces are valued differently between variants, and that some variants are more decisive than classical chess. Our findings demonstrate the rich possibilities that lie beyond the rules of modern chess.

Authors: Nenad Tomašev, Ulrich Paquet, Demis Hassabis, Vladimir Kramnik

Links:
YouTube:
Twitter:
Discord:
BitChute:
Minds:
Parler:
LinkedIn:

If you want to support me, the best thing to do is to share out the content 🙂

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar:
Patreon:
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

5 Comments

  1. This is how I explain the relation between decisiveness and the computation budget:

    In an extreme case each player is given almost no computation time. Being unable to think, they both make almost random moves. Thus white can't benefit from the little advantage of making the first move.

    The second case is to give both sides some computation time. Then the white can play strategically to benefit from the advantage of making the first move.

    The other extreme is to give both sides ample computation time. Then black can counterbalance the little disadvantage of being the second player by meticulous moves (of course the nature of the game allows him).

    It is interesting that too much computation time is in end effect like no computation time.

    The interesting case I think is to give black more computation time: How much more computation time can counterbalance the strategic disadvantage of being the second player?

  2. I would like to see the average material difference by ply, by variant. This would indicate which games allow more positional advantage; i.e. since self-capture allows very dramatic self-capture combinations, I'd expect there to be larger material differences when for example W captures his own queen to set up a fork.

  3. The porblem is not so much the draw as it is that white has an advantage to start with and torpedo looks like is making the problem worse.

  4. I always admire people who say that Alphazero (or a similar software) can learn how to play in 1 day because simply it doesn't, at least not canonical computers we all have.
    We might want to consider how much computational power google can use in those famous 24h hours… In the AlphaZero case they were 5000 parallel custom TPU so… I wouldn't say that AlphaZero learns how to play in 24h without mentioning the context…
    So Alphazero can learn in 1 day but if you want to do it by yourself you need few thousands of years…

  5. Unfortunate to not see variants being evaluated that are decisive by design, e.g. the Armageddon rule where a draw = win for black.

Comments are closed.