Chess Territorial Disputes

Introduction

Chess, often dubbed “The Game of Kings” is not considered an easy one. There are many basic rules, which often deter new players from learning it, unlike Go or Othello (Reversi), whose rules are simpler and lesser in quantity. Even once grasped, the concepts of the game and the ability to look ahead do not come easily to most. However, while many structures and strategies are subtle and require a serious length of time to recognize and master, there are a few aspects which can be seen with relative ease. For example, any rookie will recognize that the queen, having such generous freedom of movement, is generally more powerful than any other piece in the game. A slightly more complicated facet is that of board control. While discerning when positional advantage or game material are more important than each other is one of the more advanced distinctions a chess player can ascertain, some traits are easily observed, the most important of which is control of the center. Pieces in the center of the board can exercise their strength to their full potential, allowing them to threaten the largest number of opposing pieces and giving them a wide area should retreat be necessary. On the contrary, when pushed to the sides, bishops and knights are often cornered and captured, or are otherwise rendered useless or forced to retreat to unhelpful stances. Thus, chess can be seen as a game for control of the center. He who manages to establish a hold there gains a tactical advantage over his opponent.
Of course, the middle squares start empty, and are of equal distance from both players’ armies. It would seem only logical that they be the main site for activity, since in the beginning of the game, the other side of the board is heavily defended by a great number of opposing pieces. Fighting to take control of them involves both aggressive moves and calculated withdrawals, most of which we could assume to revolve around the four central squares.
It is therefore of a certain interest to see how chess moves are distributed along the board. Will they all be focused around the center? Which squares should be announced the “most popular”? From which squares do professional players abhor, and which ones attract more attention? Can we note any peculiar activity, or in general explain the various phenomena regarding this distribution? Can beginners learn anything merely by looking at the statistics of chess pros?
With these questions in mind, I set out to carry a simple calculation: to find out the relative frequencies at which game pieces step into different squares. In other words, to discover which squares will be stepped on more, and which less, in the average chess game. I chose to ignore all distinctions between pieces, and calculate just the number. The eventual result should be a chessboard, with a number between zero and one attached to each square. This number denotes the percentage of times it was stepped on by any piece, out of the total number of moves performed in a large amount of games.

Method

The first step involved is to acquire a large set of chess games. Since we are looking at a purely statistical property of chess, and considering the fact that it is quite doubtful that any single match can correctly represent the total distribution, a large amount of data is needed. The fantastic site http://www.chessgames.com/ serves perfectly for this purpose. It contains an archive of over 567,000 games from over 150 years, played by masters and grandmasters from all over the world. While information can be readily searched and filtered, I did not want any biases towards a certain playing style dictated by personality or year. Hence, the site’s random game option was used in order to get the moves.
ChessGames presents the matches in PGN format (Portable Game Notation; example), which is basically a text-based way to represent the moves. The site offers a choice of several Java viewers which assist the display of these games; however, since we are only interested in the data, this is unneeded. Luckily, each page contains a link to the raw PGN data. Thus, a small Python script was written using the handy urllib2. It continually accesses random games, finds the download link, and saves the moves to a file, where they will later be analyzed. Since it bears a striking similarity to the script I wrote for accessing Google’s search results, I was afraid that once again I would be identified as an “automated bot”. However, it seems as if ChessGames is not as busy a site as Google, nor is it often the target of denial of service attacks; my script ran without fault.
A collection of about 500 games seemed enough for this simple purpose; Effectively 509 games were downloaded. A second Python script was written with the purpose of calculating the various moves from this large collection of games. For each game, the script loaded the data, parsed the PGN format, and calculated how many times each square was visited, by any piece. Castling was considered as two moves by the rook and the king, increasing the count of the appropriate squares by one. A distinction was made between white and black sides, for while each possesses identical pieces, white always starts and thus tends to be more aggressive. I speculated that this might cause a difference in the distribution on the chessboard. Of course, the total number of moves for combined white and black pieces was also calculated. The results are as follows. Remember that white starts at rows 1,2 and black starts at rows 7,8.

Absolute:

White : 20182
=====================
8 |   64   39   70   87   78   66   32   33
7 |   98  102  132  142  148  134  101   75
6 |   99  119  224  221  195  280  142  141
5 |  156  343  283  560  605  347  454  181
4 |  332  279  697  974  832  568  366  337
3 |  243  433  864  564  627 1029  462  336
2 |   79  197  313  586  582  267  318  136
1 |   88  226  379  487  422  704  544  160
    ---------------------------------------
      a    b    c    d    e    f    g    h

Black : 19882
=====================
8 |  102  221  353  449  454  756  599  185
7 |   67  247  303  691  714  317  377  138
6 |  351  421  854  648  682 1046  490  328
5 |  329  371  701  825  773  497  345  281
4 |  156  380  306  512  442  248  326  166
3 |  123  142  238  168  163  188  121   96
2 |  107  121  122  112  109  110   85   54
1 |   42   34   56   65   75   45   21   34
    ---------------------------------------
      a    b    c    d    e    f    g    h

Total : 40064
=====================
8 |  166  260  423  536  532  822  631  218
7 |  165  349  435  833  862  451  478  213
6 |  450  540 1078  869  877 1326  632  469
5 |  485  714  984 1385 1378  844  799  462
4 |  488  659 1003 1486 1274  816  692  503
3 |  366  575 1102  732  790 1217  583  432
2 |  186  318  435  698  691  377  403  190
1 |  130  260  435  552  497  749  565  194
    ---------------------------------------
      a    b    c    d    e    f    g    h

Relative:

White : 1.0
=====================
8 | 0.003 0.001 0.003 0.004 0.003 0.003 0.001 0.001
7 | 0.004 0.005 0.006 0.007 0.007 0.006 0.005 0.003
6 | 0.004 0.005 0.011 0.010 0.009 0.013 0.007 0.006
5 | 0.007 0.016 0.014 0.027 0.029 0.017 0.022 0.008
4 | 0.016 0.013 0.034 0.048 0.041 0.028 0.018 0.016
3 | 0.012 0.021 0.042 0.027 0.031 0.050 0.022 0.016
2 | 0.003 0.009 0.015 0.029 0.028 0.013 0.015 0.006
1 | 0.004 0.011 0.018 0.024 0.020 0.034 0.026 0.007
    ---------------------------------------
       a     b     c     d     e     f     g     h

Black : 1.0
=====================
8 | 0.005 0.011 0.017 0.022 0.022 0.038 0.030 0.009
7 | 0.003 0.012 0.015 0.034 0.035 0.015 0.018 0.006
6 | 0.017 0.021 0.042 0.032 0.034 0.052 0.024 0.016
5 | 0.016 0.018 0.035 0.041 0.038 0.024 0.017 0.014
4 | 0.007 0.019 0.015 0.025 0.022 0.012 0.016 0.008
3 | 0.006 0.007 0.011 0.008 0.008 0.009 0.006 0.004
2 | 0.005 0.006 0.006 0.005 0.005 0.005 0.004 0.002
1 | 0.002 0.001 0.002 0.003 0.003 0.002 0.001 0.001
    ---------------------------------------
       a     b     c     d     e     f     g     h

Total : 1.0
=====================
8 | 0.004 0.006 0.010 0.013 0.013 0.020 0.015 0.005
7 | 0.004 0.008 0.010 0.020 0.021 0.011 0.011 0.005
6 | 0.011 0.013 0.026 0.021 0.021 0.033 0.015 0.011
5 | 0.012 0.017 0.024 0.034 0.034 0.021 0.019 0.011
4 | 0.012 0.016 0.025 0.037 0.031 0.020 0.017 0.012
3 | 0.009 0.014 0.027 0.018 0.019 0.030 0.014 0.010
2 | 0.004 0.007 0.010 0.017 0.017 0.009 0.010 0.004
1 | 0.003 0.006 0.010 0.013 0.012 0.018 0.014 0.004
    ---------------------------------------
       a     b     c     d     e     f     g     h

And of course, we cannot resist drawing a color map with intensity proportional to the relative frequency (created with OpenGL):

White:

Black:

Total:


Discussion

What can we learn from looking at these statistics? The following will not be an exhaustive approach; rather, we will look at interesting and distinguishable points we can derive from these data.
First, our main hypothesis has been affirmed; the center is indeed the focal point in the game of chess. The corners are rarely ever visited, as well as the squares a2, h2, a7, h7. From my own experience I know that usually the only pieces to travel there are the king, when hard pressed, and the rook, most often to defend lonely pawns. Of course, there is a vast gorge separating the factual observation – “the center squares are the most popular” – from any immediate conclusions. However, seeing this distribution, one could try a statistical approach to playing chess – when moving his pieces, one moves them in such a way so that they will threaten the tiles most visited by his opponent. Any real strategy will have to be incredibly more complex than this, but that may be an interesting course to follow.
One immediately apparent observation is that the board and the game are not symmetrical. This derives from the fact that the king and queen are not equal pieces. One consequence of this is that the “knight tile” of the kings (f3, f6) are visited by the appropriate color much more often – 25% – than the “knight tile” of the queen (c3, c6). Another asymmetry is castling. Castling kingside usually means having better protection for the king. It is also easier to do this, since there are less tiles in which the king can be checked, and only two pieces – the knight and bishop – have to be extracted from their initial position before doing so. Castling queenside requires moving away the queen in addition to the knight and bishop, and tends to leave the king more exposed (at the advantage of moving the rook to a more offensive position). These, along with some other minor reasons, would lead to players castling kingside more often. This is indeed the case, as can easily be seen in all color maps – the right side of rows 1 and 8 are brighter than the left, and the king’s protective area is definitely recognized.
Is white more aggressive than black? It seems so, at first glance. By comparing black’s relative frequency to white’s relative frequency for each square, it can be seen that white’s front row (row 4) is 5% more trodden on than black’s front row (row 5). Whether or not this is statistically significant in relation to the size of the data set was not verified.

Conclusions

While we have just brisked the surface of the investigation, it seems that there are many interesting points and facts one can discover when analyzing where chess pieces tend to go. A much more rigorous, thorough, and mathematical approach can probably reach much more decisive and important conclusions; however, we now have the tools to approach the problem. Extending the research might include numerical calculations, and taking into considerations the types of pieces and when they go to different squares.

Unfortunately, the programs written are much too large to be uploaded inline on this website.

Advertisements

5 comments

  1. Wow, that was interesting. Great work.

    “Thus, a small Python script was written using the handy urllib2. It continually accesses random games, finds the download link, and saves the moves to a file, where they will later be analyzed”

    Does this simply mean you have generated a number between 1000000 to 15670000 and saved the data returned?
    I can see you using pywinauto or, god forbid, SendKeys…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s