How the Daniel Ratings are Calculated

To show how these ratings are calculated, consider the following example.

The following game results occured in the 2001 football season:

Fresno State defeated Wisconsin.
Wisconsin defeated Virginia.
Virginia defeated Clemson.
Clemson defeated Georgia Tech.

Let's say we want to calculate Wisconsin's total rating. To do that, we want to compare Wisconsin to every other team in Division I-A. To show how this comparison works, consider some examples.

Wisconsin defeated Virginia. Obviously, this is good for Wisconsin's rating. Let's stipulate that, by virtue of its victory over Virginia, Wisconsin gains one rating point.

Wisconsin did not play Clemson. However, Wisconsin did defeat Virginia, who in turn defeated Clemson. Thus, Wisconsin does have an indirect "victory" over Clemson. This, too, is good for Wisconsin's rating, but not as good as if Wisconsin had defeated Clemson directly. Therefore, we only give Wisconsin half a rating point by virtue of its indirect "victory" over Clemson.

Wisconsin did not play Georgia Tech, either. However, it did beat Virginia, who beat Clemson, who in turn beat Georgia Tech. So, we can say that Wisconsin has an indirect "victory" over Georgia Tech, although this "victory" is even more indirect than the victory over Clemson. Thus, it gets only half the rating points it got for its "victory" over Clemson, or a fourth of a rating point.

As you can see, for each extra game needed to get an indirect "victory," the rating points a team gets from that victory decrease by half.

A team not only gains rating points for defeating other teams directly or indirectly, but it also loses rating points for losing to other teams directly and indirectly. For example, Wisconsin was defeated by Fresno State. By virtue of this loss, Wisconsin loses a rating point.

To calculate a team's total rating, simply sum all the rating points it gets for its "victories" over all the other teams (using the shortest possible path), and subtract the points it loses for its "loses" to all the other teams.

Note that the teams being rated must all be connected by games before these ratings have any meaning.

A More Mathematical Explanation

The following is a mathematical explanation. Please excuse any mathematical incorrectness in the following description; my purpose here is to explain things in a more concise and detailed way, not to be mathematically rigorous.

Let capital letters like A, B, and C denote college football teams.

If A has defeated B at any point in the season, we say that A has outperformed B, and we denote it like this: A>B. This is an inequality of teams. We can cascade the inequalities like this: A>B>C>D, which means A>B and B>C and C>D. Such a cascade of inequalities, starting with A and ending with D, is called a transitive path from A to D.

Now, we postulate a transitive property for this inequality of teams: if A>B, and B>C, then A>C. Thus, even if A and C do not play each other, we can say that A has transitively outperformed C. Because of this property, A>B if and only if there is a transitive path from A to B.

However, when rating teams, we cannot simply say that A has transitively outperformed B; we have to say how much A has outperformed B. In other words, we have to quantify the inequality.

Suppose A>B. Then there must be a transitive path from A to B. We define a function, denoted L(A,B), to be the length of the shortest transitive path from A to B. (The length of a transitive path is the number of inequalities in it.) If A has not transitively outperformed B, then L(A,B) is undefined.

Now, define a Transitive Power Funciton:

                /  2^(1-L(A,B)),   if A>B
      T(A,B) =  |
                \  0,              if not A>B
This quantifies how much A outperformed B. Note that T(A,B) decreases exponentially with L(A,B).

Now that the inequalities have been quantified, the following formula determines a team's rating (denoted R(A)):

      R(A)  =  Sum  [ T(A,B) - T(B,A) ],
               for B in Division I-A
Although the summation is taken over teams in Division I-A (the reason for this is given here), and the rating function is defined only for Division I-A teams, the shortest transitive path may go though Division I-AA.

Discussion of the Python Script that calculates the Ratings

Currently, I calculate the ratings using a Python script.

It is rather trivial to read in input and tabulate final scores, so I don't describe them in detail here. See the script if you're curious.

The most difficult part of the calculation is determining the shortest transtive path between teams. The script does that by using a recursive descent algorithm. Basically, what it does is create a 2-D dictionary stpl[A][B] that is used to store the current shortest transitive path length from A to B. For example, stpl["PennState"]["VirginiaTech"] stores the shortest path length from Penn State to Virginia Tech.

Each entry in this 2-D array is initialized with a large positive number, 9999, indicating that the shortest transtive path has not yet been found. The script then chooses a school, say, for example, Penn State. For each team that Penn State defeated, it stores the a value of 1 in the 2-D array (i.e. stpl["Penn State"]["Purdue"]=1), and then checks all of that team's victories. For each of that team's victories, it stores a value of 2 the 2-D array (i.e., stpl["Penn State"]["Michigan"]=2), and checks all of the third team's victories, and so on.

Now, if, somewhere in the recursive search, a team is encountered to which a transitive path has already been found, then the script checks whether the current path length to that team is shorter than the one found previously. If it is, then the script overwrites the current path length in the table, and recursively checks the teams victories. If the current path is longer, then it is a stopping condition; there is no need to further consider that team or any of the teams it has defeated, since an equal or shorter path to it has already been found.

Once the 2-D array is finished for each team, the transitive power function is applied and total rankings are calculated in a straightforward manner.

The script to do this is surprisingly small, only about 300 lines; and 117 of those are just to list the teams in Division I-A.






Menu


Links


Questions, Comments?






Copyright (C) 2001-2002, Carl Banks. All rights reserved.