Flaws in the Daniel Ratings

The Daniel Ratings have a couple flaws that make the results unacceptable for official use (other than chosing BCS teams). There is a serious flaw and a not-so-serious flaw. The not-so-serious flaw is a dependence on sparsity, the serious flaw is sensitivity to schedule structure.

Dependence on Sparsity

Imagine if this rating system is applied to Major League Baseball. Major league baseball teams play 162 games, and play every team in their league at least six times. (We'll disregard the interleague games for the moment.) In most cases, every team would have a rating of zero, because typically every team would have at least one victory over and one loss to every other team. Or, if one team sweeps a season series, it would be the only team with a positive rating.

It is obvious that the Daniel ratings become meaningless if there are too many games. In other words, this system depends on sparse data.

Of course, in college football, teams only play a dozen or so games, so the requisite sparsity of the data is there. Nevertheless, I wonder if there could be some residual effect. For example, most teams play a conference schedule, where most teams play each other. Does this distort the results any?

Having said that, I feel that college football would have to add a lot of game before the results start to be meaningless, so I classify this as a not-so-serious flaw.

Sensitivity to Schedule Structure

Differences in the schedule structure between two teams can lead to an advantage for one of the teams.
Number of Games
An obvious flaw in the system is the fact that teams play different numbers of games. Suppose two teams finish the season undefeated, but one played only 11 games, whereas the other played 13. The second team got 1300 points for its undefeated schedule, while the first got only 1100.

In many cases such as this, the solution is to just divide by the number of games played. In this case, however, that won't work. The Daniel Ratings are a point-to-point comparison between teams; every team is compared to every other team, and these team comparisons are summed to give the total rating. Thus, there is no "per game" rating, and so dividing by the number of games is not appropriate.

Conference vs. Nonconference
A less obvious flaw can be seen by considering Florida State and Notre Dame. Suppose they finish the season undefeated, each playing 11 games. Even though they play the same number of games, their schedules appear quite different to the computer. The reason for this is that Florida State plays in a conference, the ACC, while Notre Dame is independent.

In the ACC, every team plays every team. So, if, say, Georgia Tech beats everyone else in the conference, Florida State would not get any half-credit rating points by virtue of Georgia Tech's victories, because the Daniel Ratings only consider the shortest transtive path. The shortest path from Florida State to every ACC team is 1, the direct victory. So Florida State could never benefit from other ACC teams beating each other.

On the other hand, suppose Georgia Tech is also on Notre Dame's schedule, and Notre Dame beats them. Notre Dame will get all the half-credit points from Georiga Tech's victories in the ACC. In fact, because Notre Dame is an independent, many of the teams on their schedule won't play each other. Thus, almost any win by Notre Dame's opponents would result in half a rating point for Notre Dame.

By virtue of their schedule, Notre Dame has much more opportunity to gain half-points by their opponents' victories than Florida State.

In some fairness, Notre Dame would also have more opportunity to be ranked lower if they lose.

Cliques
A third flaw, related to the previous one, is that college football teams tend to form cliques, which are a group of teams that have relatively few games scheduled with other groups. (A conference is one kind of clique, but there are larger cliques and smaller cliques, too.)

When teams form a clique, it restricts the flow of information. There are not so many games between teams of different cliques, so it is hard to compare teams from different cliques. This is a problem for any ratings system. However, this is a very hard problem for the Daniel Ratings, because of its dependence on shortness of path. If one clique is better than another clique, then the teams from the bad clique still have a short path to the other teams from the bad clique, whereas teams from the good clique have a long path to those teams. If the bad clique has more teams, they might end up outranking teams from the good clique.

Division I-A is a good example of a large clique. NCAA rules specify that, in order for a team to play in a bowl, that team must have six wins over Division I-A opponents (excepting one I-AA team every four years or something like that). Because of this, I-A teams do not often play I-AA teams, and as such, Division I-AA teams have a certain shortness of path advantage over I-A teams. This caused some I-AA teams to appear very high in the ratings before I decided to rank only I-A.

Possible Solutions

Here are some ideas I have for solutions.

It seems that the fatal flaw in the Daniel Ratings system is that much information is thrown out when the system decides to only account for the Shortest Transitive Path. So, the obvious remedy is to consider all paths.

The result of this line of thinking is perhaps a little obvious. If we consider all paths, the system degenerates into an RPI-like system. That is, it would be some number times your winning percentage, plus some number times your opponents winning percentage, etc. It would be slightly different than the RPI because it goes deeper than opponents' opponents, and the numbers would be different.

A second idea is to consider only shortest paths, as before, but to compensate for the hiding effect caused by selecting the shortest path. For example, in the Florida State case with the ACC, we could detect the fact that Florida State can't benefit from other ACC games, and compensate for it.

The basic idea is to determine the total number of possible shortest transitive paths (possible meaning that we assume that the right teams win) one can take, and then giving the team rating points based on the percentage of those paths that are actually are transitive paths.

One Other Thought

Almost everything written in this section is a result of my intuition about the ratings. With more careful study, I might discover cancellation effects that make the flaws in the system not as severe as I thoiught. Or I might uncover unforseen problems with my ideas for solutions.

Copyright (C) 2001-2004, Carl Banks. All rights reserved.