next up previous
Up: Ratings Previous: Soccer Ratings

Technical Issues

The basic probability model underlying paired comparisons assumes that the comparisons (matches) are independent bernoulli trials. This is a strong assumption, and may not be exactly correct. For example, if for a series of matches during the season several key players are injured, then the team may be functioning at a lower level until those players return, which induces dependence (correlation) in the results. The college season is short, which makes it difficult to estimate these effects; my feeling is that the average performance level over the season is still estimated well in spite of the occasional short period dependencies.

There are various methods of estimation which might be used to estimate the ratings, given the outcomes of matches. One standard method, maximum likelihood estimation (MLE) has nice properties if there are lots of data. MLE corresponds to choosing the values for the unknown parameters (ratings) which lead to the maximal probability for the observed results. Unfortunately, if a team wins either no games, or all of its games, then the ML estimate of its rating is undefined. One solution is to make use of Bayesian methods. This seems especially appropriate for team sports where the total number of games will be small (15 or so per season) and some teams may have perfect records (positive or negative). Furthermore, teams often have considerable continuity of players, coaches, etc. from one year to the next. For example, knowing that the UNC women's team had a very good record this year (not to mention last year, and the year before, ...) leads us to suspect that they will do well again next year.

In the long run, I plan to use the posterior means as the ratings. In the short run, I am simply using the posterior mode, which is similar to MLE's, but without the undefined ratings problem. For the 1995 season, having insufficient prior information, I have used a common prior for all teams, which is equivalent to assuming that the teams are sampled from a population of possible teams with a specified ratings distribution.

Ties are at present scored as 1/2 a win and 1/2 a loss, which is admittedly a bit ad hoc, but seems to work just fine. There are generalizations of the basic model which incorporate a parameter for the probability of a tie, but in other realms whith which I am familiar (eg. chess) this doesn't seem to be very helpful.


next up previous
Up: Ratings Previous: Soccer Ratings

Albyn Jones
jones@reed.edu
Wed Jun 19 15:59:37 PDT 1996