How Ken Regan’s method works

Disclaimer: This is my understanding on how his approach works based on this talk given to FIDE. If you spot mistakes or have additional information, feel free to comment below. He has several papers published on the topic of cheat detection, which I do not consider here.

There is a lot of talk about the Hans Nieman case, but I found surprisingly few info and explanations on  how cheat detection actually works. Recently, Super GM Fabiano Caruana criticized Ken Regan’s „algorithm“ as creating too many false negatives. IM Kostya Kavutskiy described Ken Regans method as „I don’t understand how they [statistical models] work but it involves math“. In fact, the way that cheat detection is characterized as an „algorithm“ already hints that there is a false impression how things actually work. While describing this as an algorithm is technically not wrong, the idea that there is an algorithm that runs on some games at the end outputs „cheater“ or „not a cheater“ is too simplistic.

Rather, this is all about anomaly detection. You compute statistics for games, and then these statistics have to be interpreted. And for certain thresholds this can be done more or less automatically, but depending on the results, things might get fuzzy.

I will here provide a short introduction on how Ken Regan’s cheat detection actually works. Note that this is based on his (public) slides – however they are dated 2013 and I am sure he has updated and refined his approach since then. Nevertheless I think looking at the ideas he presented there is worthwhile, as it provides a good introduction into the topic, and the general idea is probably going to be the same for current methods.

In general we need to define metrics for cheat detection and compute these metrics w.r.t. a number of games. Ideally, these metrics are able to distinguish games where cheating occurred from those where players played without any assistance.

Let’s first have a look what metrics he uses. In his slides, Ken Regan mentions:

  • move-matching (MM%)
  • average error per move (AE), scaled in units of PEPs
  • equal-top matching (TM%), usually 34% higher than MM%

There are no definitions given in the slides. However we can make an educated guess about what is meant.

Move matching probably refers just to comparing moves from a game with the corresponding top engine move in that position.

Average error per move (AE) refers probably to comparing the evaluation of a played move with the top engine move, computing its difference in e.g. centipawns, summing these differences up for all moves of a game and dividing it by the number of moves played. This is limited however to positions with PEPs, which is defined as “Pawns in equal positions, i.e. we want to rule out very unequal positions. Say if you are +4.4, and you play a move that is just +4.1, then this is something we might want to rule out – it might technically be an error, but practically just converting to a winning position or simplifying to convert the win.

Equal top matching probably refers to move matching but not just for the top engine move but rather for the first few top moves of the engine within a defined interval.

Let’s consider an example with AE. Suppose we have a huge database of games, and filter that database w.r.t. to a certain engine range, i.e. top players with an ELO of 2700+. We create a range of AE values and for each AE value count how often one specific value occurred. For example:

  • an AE of 0.0 occurred in 1 game
  • an AE of 0.1 occurred in 11 games
  • an AE of …  occurred in … games
  • an AE of 1.5 occurred in 1253 games
  • an AE of 3.0  occurred in 10 games
  • an AE of 3.1  occurred in 0 games

These numbers are of course completely made up. If we plot these data we will get a bell curve which is going to look like this.Bild3

We can assume that these numbers follow a normal distribution. There are two reasons for this. The unscientific one is that if you have no clue how your data is distributed and there is no specific reason to consider something else, the normal distribution is always a good guess 🙂 The scientific one involves the central limit theorem which I will not delve into here.

The Measurement

So how are we going to use this metric to distinguish between cheating and non-cheating games?

The crucial values that characterize a normal distribution are the mean µ and the standard deviation σ. We do not know the precise values, but we can estimate them if we have a large collection of games. Suppose there are 100 games of (non-cheating) players, then we compute the means µ as:

Bild1

And the standard deviation as

Bild2

Given µ and σ we can plot the underlying normal distribution.

Bild4

Note that by having µ and σ, we can think of AE as a real-valued random variable and for each real valued AE we can get a value on how probable the occurrence of this AE is. What is interesting here is that we can give a visual interpretation for various sums of probabilities.  In particular there is the famous 68–95–99.7 rule.

Bild5

For example the purple area covers 68 percent of all possible values. The purple and red area cover 95 percent of all possible values. And take additionally the yellow area, 99.7 percent of all values are covered.

Where do these three values come from? As you can see the areas are split in a way that we move left and right from the mean. And the amount that we move left and right is just a multiple of the standard deviation:

  • the left and right line of the purple area is -1*σ and +1*σ
  • further to the red area we have -2*σ and +2*σ
  • and at the yellow borders we are at -3*σ and +3*σ

In other words, suppose we encounter a game where we get an AE value that is very far from the mean, i.e. close to 3σ away from the mean. This means that this is outside of the 0.997 fraction of values; and if we convert this into probabilities, it means it’s a chance of 1 in 370 (cf. the probability table here.

If we step 4σ away from the mean, it’s already 1 in 15787 and so on.

This is precisely where need interpretation on how likely such an „odd“ performance actually is. In his slides, Ken Regan uses the number of games played measured in weeks of TWIC and gives some estimates for multiples of the standard deviation that should raise eyebrows.

Note that we here think in multiples of the standard deviation. We could also convert the observed AE_Game_? of the game in a standardized value by dividing the difference from the mean by the standard deviation:

Bild6

This is known as the z-score, and we can use z-scores to look for suspicious games and compare z-scores for games.

Multiple Games

We can however go one step further. Suppose we have not just a single game, but say 64 games that we want to take a look at. We can ask the question: Say we would randomly select 64 played games (randomly sample 64 games), what kind of result would we expect to obtain? And then, how do these 64 games, say suspicious games from a tournament from one player compare to such a random selection of games?

We can use z-testing to check if there is something fishy. For that we need to treat the 64 games as random samples, and random samples can be characterized by a sampling distribution.

The mean of the sample distribution is given by

Bild8

and the standard deviation of the sample distribution (also known as the standard error) is given by

Bild9

Consider a small example. Suppose that from our large database we have determined \mu = 50 (centipawns) witha standard deviation of 16 centipawns. We now have a sample of 64 games with µ’ = 40.

Then

Bild10

Note that for just one example as in the case above, we have squareroot(1) = 1, i.e. σ‘ = σ.

We obtain a z-score

Bild11

Again, you need to interpret this number. There are z-tables which give the probabilities for z-scores. Here we can conclude that it is highly unlikely. If you sample 64 games – which is quite a high number, it is just very very unlikely that we end up so far away from the mean (of the original distribution). In practice however it is unlikely that we get such obvious indicators. Nevertheless it would be very interesting to compute a few statistics for say, the games of Igor Rausis.

Next let’s consider some questions that are asked a lot:

  • instead of thinking through all this math’ stuff and statistics, can you not simply start up Chessbase, click some buttons for which you have no clue what they actually do and for which the manual says „this correlation isn’t a sign of computer cheating“, publicly post that on youtube and not get completely roasted for doing so? Apparently, the answer is yes.
  • What if you cheat in a smart way, in that you use the engine only a few times in a game? If you do that often, then your AE mean will change (even if slightly) and eventually show in the stats
  • What if you cheat in a smart way, in that you use the engine only a few times in a game, but then afterwards play some slightly incorrect moves such that your AE values do not change? Then the metric will not detect it and it will not show up in the statistics. Note that this is why neither Ken Regan nor chessdotcom nor anyone else can ever be fully transparent about the metrics and techniques that they actually apply – you can tune your cheating in such a way that statistics will show nothing. Note however that one the cheater’s end it is also not easy to manipulate that way, as you have to make sure that several metrics will stay within normal parameters.

Last if we have more data, it will become more difficult to cheat. For example instead of computing AE, one could think of measures like:

  • average error per move, but only for those moves where the player just returned from tab switches in the browser (where a cheating player presumably looked at an engine vs an honest player, who might just have adjusted the volume of music playing in another browser tab).
  • average error per move, but only for those moves where the player has XX minutes/seconds on the clock.
    Here we assume that the error rate significantly increases for an honest player under pressure, where for a cheating player these are exactly the crucial moments where he cheats

There are other ways as well. For example we could try to construct a neural network that tries to identify “crucial” positions in a game, namely precisely those positions that are difficult to navigate in for a human player. If we then measure the average error for moves played in such positions, this metric might be better in distinguishing cheaters from honest players.

You can read more about neural networks in my free book about Neural Networks for Chess.

As you can see, statistics cannot provide perfect proof, but are fuzzy. For some cases it’s crystal clear, i.e. if we see a performance with a good metric – such an average error per move – that is as likely as winning the lottery. But in other cases it’s not that simple.

More importantly it is important to understand the result. If statistical methods indicate oddities, then this does give a strong indication of cheating indeed. But if statistical methods do not show any irregularities, this does not mean that someone did not cheat. It just means that the statistics do not support any evidence of cheating. That’s something very different, but is often confused in practice.

Last, if we want scientists like Ken Regan to develop better methods and metrics then FIDE should change two things:

  • collect and provide more data for tournament games, especially time information and
  • provide a research grant such that a few PhD’s can conduct research on chess cheat detection

I am sure that there are several research directions which are unexplored yet.

How Ken Regan’s method works

Ein Gedanke zu “How Ken Regan’s method works

  1. Hi

    I have just stumbled across your chess gui JerryFX & downloaded & played my first game & enjoyed this-thank you. I did though want to ask if you could add resign game button functionality?I did not see this & lc0 engine I installed outplayed me & it would have been convenient to have a resign button! Many thanks & best wishes

    Like

Hinterlasse einen Kommentar

Diese Seite verwendet Akismet, um Spam zu reduzieren. Erfahre, wie deine Kommentardaten verarbeitet werden..