Understanding SWAN Games Rating System.

By far, the biggest misunderstanding surrounding the rating system can be summed up in 1 question:

Q. "How can my rating go down after I scored a <insert your good MP or IMP score here>?"

A. The simple answer is that the rating system is not designed to consider the duplicate score. (Duplicate scores are
reflected in the statistics in your profile.) This brings about another question:

Q. "Well - if the system doesn't use the duplicate score, how can it work?"

A. The best way to understand how the rating system works is to think of yourself as competing against the cards. The reasoning behind this will become more clear as you continue reading this discussion. There are a number of reasons why we chose this approach, and why it produces a better rating system than one based on duplicate scores.

Some differences between live bridge and online bridge.

First, lets make some observations about duplicate scoring...

Duplicate scoring is a method which attempts to eliminate much of the luck involved with the cards. It is a very useful technique for evaluating a player's performance within a field. Duplication is ideal for scoring tournaments...this is because:

  • Tournaments typically involve a (relatively) small number of boards...duplication is useful for removing the luck of the cards.
  • Tournaments typically involve a movement...the more opponents a pair plays against, the more their overall duplicate score represents an average "against the field"
  • Tournaments typically run in a timely manner...duplicate scores can be calculated and results posted in some order.

Playing online is a different environment than playing at a tournament, and the conditions at a tournament listed above do not really hold online. Consider:

  • When you play online, you are not limited to a small number of boards...you can play as many boards as you wish. After playing hundreds of boards, the overall luck of the cards you received will tend to even itself out.
  • When you play online, your opponents are not chosen by a movement. You overall duplicate score may not represent a true average "against the field".
  • When you play online, it is convenient to be able to play boards at your leisure. Since boards are not played according to some schedule, duplicate scores and results may not always be available.

Designing a better rating system.

Our rating system was designed with the online environment in mind. When deciding the basis for rating changes, we decided against using the duplicate score. The conditions which make duplication useful for tournaments do not hold up as well online. Even more importantly, a rating system based on duplicate scoring would need to deal with the "strength of field" factor. This factor presents a serious complication from a statistical point of view. By not basing our rating system on duplicate scores, we were able to produce a simpler system - one which does not suffer from the countless (and easily manipulable) approximations required for factoring in the strength of field.

A nice side-effect of not using the duplicate score is that our rating system does not need to wait for a board to be retired to make rating adjustments. Our rating system computes a rating change after every board you finish. Being able to instantly update ratings avoids the many problems involved with a "delayed update" type system. Consider:

  • With a delayed update system, a players rating is always being updated at a different time than when the score was made. This has an inherent problem: to accurately calculate rating changes requires knowing each player's rating at the time the score was generated. Since players dont play boards in any particular order, and since boards don't naturally retire in any particular order, it is practically impossible to avoid using estimated ratings as inputs for the rating change calculations.
  • Think about what happens as certain players naturally play more boards than others...With a delayed update system, these players will either need to wait longer and longer periods to see rating changes as other players "catch up" and retire these boards, or boards would need to be prematurely retired at regular intervals (and then you have to decide how to process data from the prematurely retired boards).

Why it's valid and how it works.

Argument: "If you aren't using the duplicate score, then you must be using rawscores. This is just wrong - it ignores the fact the proper play involves maximizing your duplicate score, not your rawscore."

This argument would be correct, if we were just using plain rawscores for our rating calculations. But, we are not. Before calculating rating changes, we scale the rawscore in a manner consistent with the type of board you are playing. Since the way we scale the rawscore is different for MP and IMP, we keep seperate ratings for these. By scaling the rawscore, we are able to calculate rating changes which are consistent with the "maximizing strategy" of the underlying scoretype.

As a general overview, here's how the rating system calculates a rating change:

    1. When a board is completed, the NS score and the 4 players current rating are used as inputs to the rating calculation.

    2. The NS score is scaled according to the scoretype.

    3. The 4 player ratings are used to calculate an expected scaled score.

    4. The difference between the score from (2) and the score from (3) is used to generate a rating change..if (2) is greater than (3), then the NS players' ratings increase and the EW players' ratings decrease. And vica-versa if (3) is greater than (2).

And that's all there is to it.

Yes - it might seem funny when your rating goes down after earning a great duplicate score, but that is by design. The proper perspecitve is that you are competing against the cards - bad cards cause your rating to go down, good ones cause it to go up. If you play the bad cards better than everyone else, then your rating will go down less for these cards than theirs will, and if you play the good cards better than everyone else, then your rating will go up more for these cards than theirs will. As you continue to play, your rating will tend to reflect your skill level.

Because of the way the rating system is designed, fluxuations in a player's rating are a natural consequence. It would not be unusual, for example, for a players rating to swing 20 points over 5 board session. Also, when you are new (less than 500 experience), your rating changes are amplified to help your rating converge quicker, making even greater swings possible. In general, a player does not have short term control of his rating. Remember, for a single board, your rating can still go down even if you play it perfectly. (However, it is important to remember too that it will go down by less than it would had you played it worse.) Overall, we estimate that it takes appoximately 500 boards for a rating to converge, and that ratings are accurate to within +/- 50 points. This number assumes that the majority of the players you play with already have seasoned ratings. Since Swan is relatively new, many players do not have seasoned ratings, which means it can take even more boards for an accurate rating.

A happy thought :-)

One final note: Since short term rating changes are meaningless, and since there is no statistical advantage to choosing certain partners or opponents, we feel our rating system reduces players' sensitivity to their ratings, making Swan Games a friendlier place to play online bridge.