A Bridge rating system

see an algorithm here: https://github.com/my-bridge/RatingCalculator

Advantages of a rating system

It could help in finding appropriate opponents or for a match-making algorithm.
It could help in finding appropriate team mates.
It is easy to calculate your own rating.
You don’t need much database space to save a player’s rating (basically you just need to save a couple of numbers and Boolean values, but not a list of every single board result).
It could help detect cheating.
One could possibly monetize or outsource a ranking list (see for example www.2700chess.com)
You could give away titles like GM or IM (see chess titles) for certain rating performances. Please use titles the non-Bridge-playing public can relate to. “Everyone” knows what a grandmaster is, almost no non-Bridge-player knows what a life master is.
People like to compare. It is fun to compare player’s strengths.

I really hope platforms like BBO, Realbridge, Swanbridge or Lovebridge will implement this. It is difficult to find a good game on BBO against unknown opponents. A rating-based match-making algorithm would certainly help.

Perhaps this system inspires someone to construct an even better rating – great!

What does memorylessness mean?

If A and B have a rating of 2,700 (and the same k-factor) and their performance over the next session is the same, they will have the same rating afterwards. If you just look at – say – the average performance over the last 1,000 boards, then their new rating depends on the last board which get thrown out. So e. g. even with a 2,700 session, you might lose points if the thrown-out boards were even better.

What is the interpretation of a rating?

This is a rating for IMP games (team, IMP pairs or IMP scoring in the main bridge club).

Currently the highest possible rating is 3,400. In that case you never make a single-dummy error.

If you have a rating of 2,600, it means you lose on average (3,400 – 2,600) / 1,000 = 0.8 IMPs/board.

If you have a rating of 2,500, you lose on average 0.9 IMPs/board and so on.

3,400 was chosen so that the best players have a rating over 2,600 and 2,700 is the elite level, and players with a rating of 2400 or more are certainly extremely strong as well. I tried to make the numbers somewhat comparable to the famous Elo rating for chess players.

Depending on the settings of k-values, the starting rating etc., approximately 55 players have a rating above 2,600 and 13 above 2,700.

As of today, 35 chess players have a rating of 2,700 or more. So one could argue that the perfect rating should not be 3,400 but a slightly higher number.

0.1 IMPs/board difference – isn’t that “nothing”?

Make no mistake a 0.1 IMPs/board difference in playing strength is a lot. I could certainly tell by looking at individual boards if a player is a 2,500 or a 2,600 player.

If you share my belief that card-playing ability and judgement are correlated, I think it is fair to say that if you have four players who are 0.1 IMPs/board better than their opponents, they will likely win the match by 0,8 IMPs/board or more (ok, that is just a guess, but it shows that 0.1 IMPs/bord is not as tiny as it may sound to some).

If you look at 12 boards, a 0.1 IMP difference means:

player A finds a 10% chance of making a vulnerable 4©-contract which player B does not find (this missed chance costs 10% x 12 IMPs in the long run) OR
player A misses two 10% chances in vulnerable 3ª-contracts. (Each 10% chance you overlook costs about 6 x 10% IMPs) OR
player A misses three 10% chances in non-vulnerable 2©-contracts. (Each 10% chance you overlook costs about 4 x 10% IMPs)

Also, see the caveats below.

What is the idea behind an IMPs lost per board-based rating?

This is based on the following formula:

card-play performance = IMPs lost due to mistakes + IMPs lost due to randomness

(Note that you cannot gain IMPs.)

If you just compared the table result to the dd optimal result (in this contract), the rating would hinge much more on the quality of the opponents. That’s why I think card-by-card analysis works better. Note that opportunities for psychological plays which are technically inferior (and costly) are fairly rare.

I believe the latter term (IMPs lost due to randomness) has a mean somewhere between 0.5 and 0.6 IMPs/board which you cannot avoid in the long run.

Furthermore, I believe that card-playing performance is strongly correlated with judgement in bidding and ability to find good leads.

So overall, playing strength can be measured by the following formula:

playing strength = card play + system + judgement + leads

This rating does not reflect one player’s ability to play a very good but involved system.

Why is Imps lost per board chosen and not the percentage of accurate cards?

IMPs lost per board is more in line with thinking process of a player in an IMP game. You try to avoid losing IMPs, not maximize the number of perfect cards. E. g., you want to make your contract as safely as possible and not risk it for an overtrick as this minimizes the IMPs lost in the long run.

Looking at percentage of perfect cards is a good way to catch cheats though.

It is true that sometimes but rarely the best play is a psychological line and not the best technical line. This is part of the error term (IMPs lost due to randomness).

How are the IMPs lost per board calculated?

Basically, looks at every card you play as declarer or defender (the opening lead is excluded) and if you make a mistake, you lose imps against double-dummy perfect play.

For example, you play a vulnerable 4ª contract which can be made after the opening lead. If declarer (South) now makes a mistake and the contact is down 1 and then defender sitting on East makes a mistake so that 4ª is makeable again, then South loses 12 IMPs, East loses 12 IMPs and West loses 0 IMPs. North (dummy) is not involved in the play at all.

If South (declarer) and East play “ping-pong”: South makes a mistake (down 1), East makes a mistake (makeable), South makes a mistake (down 1), East makes the final mistake and contract was made, then South loses 12 IMPs, East loses 12 IMPs (and West loses 0 IMPs). North (dummy) is not involved in the play at all.

One could also base a rating on cumulative mistakes, the resulting rating lists would look very similar.

What is the formula behind the rating?

A new rating is calculated as follows:

Rating_new = rating_old + k * (expected performance – performance in this batch)

For details see the python script here:

https://github.com/my-bridge/RatingCalculator

Batch: a number of boards, I chose 12 (a session contains 16 boards, on average you are involved in 12 boards)

For simplicity, I updated the rating on a monthly basis: For example, if a player played 1.200 boards in a month and lost 720 IMPs, I would assume he lost 0.6 IMPs per board in 100 batches in this month. The batches are calculated one-by-one.

A platform which implements this rating could use the true values for each batch.

performance: avg. of IMPs lost/board in this batch

exp. performance: expected avg. of IMPs lost/board in this batch

(if you have a rating of 2,750, you would expect to lose 3,400 – 2,750 / 1.000 = 0.65 IMPs/board).

Interpretation of the k factor

(see for example k factor in Elo ratings)

The higher k is, the faster the rating changes. It converges faster, but it also means that the number fluctuates more.

I chose 48 for new players (30 or less batches included in the rating),

16 for players with at least 31 batches and where the rating was at some point above 2.500 (but the rating could be lower right now).

24 for all other players.

In chess these numbers are 40, 10 and 20 respectively (and in other sports different k-factors are used, see for example the Wikipedia article about Elo ratings).

One has to experiment with these numbers.

Which data are included/excluded?

I looked at players who I think could be among the best in the world, among the best in Germany or who are otherwise of interest to me.

Excluded are data from pairs play (%, BAM etc.), total points and boards in which at least one of the players is a robot.

Who are the best players?

Disclaimer: No disrespect if you are not mentioned here.

I may not have collected (all or some) data for all players who are at the elite level. This list is somewhat Europe and North-America centric. Unless you have talked a lot to someone or partnered him or looked at a lot of games it is difficult (if not impossible) to assess the playing strength of a single player from results alone. Therefore, it is quite possible there are some very strong players who are not on my radar.

I may or may not have excluded one player/players who I am fairly confident is/are cheating. On the other hand, I am not saying anyone on this list is clean (but I hope so). For this reason, I won’t answer questions regarding individual ratings.

This is a proof-of-concept. Just take it as list of honour. These are all players with a current rating above 2,600.

Rankings from Sep 2020 (CAS was already established and well known back then) to Aug 2023:

+——+————+—————+————+—————–+——————+————-+

+——+————+—————+————+—————–+——————+————-+

| 1 | norby | 2798.15928429 | 7 | 5.25 | 16 | 174 |

| 2 | dzeronimo | 2749.15038532 | 9 | 0.0 | 16 | 345 |

| 3 | giova007 | 2739.67506273 | 4 | 7.6 | 16 | 218 |

| 4 | boye | 2735.67875473 | 8 | 6.18867924528 | 16 | 272 |

| 5 | cillar | 2729.87516833 | 10 | 4.28571428571 | 16 | 379 |

| 6 | beukertje | 2727.85503511 | 8 | 9.26315789474 | 16 | 402 |

| 7 | johnhurd | 2719.15580794 | 6 | 0.352941176471 | 16 | 162 |

| 8 | dagold | 2718.55612918 | 9 | 6.57692307692 | 16 | 555 |

| 9 | espene | 2718.16075191 | 4 | 4.57142857143 | 16 | 346 |

| 10 | tbak | 2714.71730096 | 11 | 1.29411764706 | 16 | 84 |

| 11 | ilaria75 | 2713.04984118 | 6 | 3.0 | 16 | 412 |

| 12 | firechief | 2712.92085187 | 6 | 2.72727272727 | 16 | 252 |

| 13 | kot_korba | 2707.85992872 | 3 | 0.9375 | 16 | 163 |

| 14 | jjmeck | 2693.61074163 | 2 | 1.79746835443 | 16 | 139 |

| 15 | redds | 2688.20614308 | 6 | 1.2 | 16 | 345 |

| 16 | giacomopr | 2687.96928112 | 4 | 3.66666666667 | 16 | 127 |

| 17 | tgbh3 | 2687.35806453 | 7 | 8.53125 | 16 | 185 |

| 18 | stevieg | 2677.93883181 | 9 | 9.52941176471 | 16 | 186 |

| 19 | wstarkowsk | 2674.25255448 | 7 | 5.75555555556 | 16 | 401 |

| 20 | skrzat96 | 2667.62957865 | 11 | 5.82352941176 | 16 | 283 |

| 21 | game over | 2659.65157376 | 5 | 3.26086956522 | 16 | 198 |

| 22 | m_difranco | 2658.7595095 | 2 | 1.02255639098 | 16 | 619 |

| 23 | septiembre | 2649.86225669 | 8 | 20.6666666667 | 16 | 205 |

| 24 | levin | 2647.49578981 | 1 | 0.375 | 16 | 119 |

| 25 | pocken | 2645.5941965 | 7 | 3.81818181818 | 16 | 235 |

| 26 | hatol | 2640.39353016 | 5 | 1.66666666667 | 16 | 242 |

| 27 | joegrue | 2637.19452532 | 2 | 2.0 | 16 | 383 |

| 28 | zia | 2632.95884802 | 4 | 2.88888888889 | 16 | 226 |

| 29 | mickyb | 2631.94235693 | 8 | 4.21052631579 | 16 | 250 |

| 30 | okvince | 2631.38633926 | 5 | 0.625 | 16 | 202 |

| 31 | fred92130 | 2627.52440539 | 11 | 9.625 | 16 | 194 |

| 32 | ballebo-jr | 2624.60952419 | 7 | 3.11111111111 | 16 | 265 |

| 33 | ballebo | 2624.17519454 | 8 | 7.46666666667 | 16 | 92 |

| 34 | ultimike | 2623.97734504 | 7 | 5.76470588235 | 16 | 33 |

| 35 | j li | 2622.65023088 | 3 | 1.125 | 16 | 59 |

| 36 | rogerclee | 2622.42047472 | 3 | 2.66666666667 | 16 | 228 |

| 37 | paulinka18 | 2621.28362419 | 3 | 0.642857142857 | 16 | 257 |

| 38 | bf88 | 2619.88908754 | 7 | 0.7 | 16 | 100 |

| 39 | loukie | 2618.21876953 | 7 | 11.8461538462 | 16 | 236 |

| 40 | begse | 2616.54575246 | 9 | 9.9 | 16 | 299 |

| 41 | jan_jansma | 2615.1628124 | 5 | 3.15789473684 | 16 | 250 |

| 42 | lady007 | 2614.85173181 | 7 | 5.13333333333 | 16 | 315 |

| 43 | pierced | 2614.56972635 | 2 | 1.48717948718 | 16 | 851 |

| 44 | lmilne | 2614.52806589 | 3 | 0.0 | 16 | 47 |

| 45 | kerri | 2613.87771585 | 2 | 2.4 | 16 | 68 |

| 46 | sandria | 2613.42025126 | 3 | 5.25 | 16 | 276 |

| 47 | woozle | 2611.81203845 | 1 | 1.04255319149 | 16 | 259 |

| 48 | drorp | 2610.89872116 | 11 | 7.16279069767 | 16 | 259 |

| 49 | bbramley | 2606.20742354 | 2 | 1.33333333333 | 16 | 158 |

| 50 | ronpa | 2606.02119224 | 1 | 0.0769230769231 | 16 | 178 |

| 51 | kevsters | 2603.87213535 | 1 | 0.571428571429 | 16 | 246 |

| 52 | iris100 | 2603.51260323 | 10 | 11.8181818182 | 16 | 40 |

| 53 | arobson | 2602.96199382 | 9 | 5.0 | 16 | 87 |

| 54 | gpchagas | 2602.45260887 | 3 | 2.4375 | 16 | 516 |

| 55 | yanivz | 2602.22876508 | 6 | 2.21052631579 | 16 | 130 |

+——+————+—————+————+—————–+——————+————-+