Flaws in the Daniel Ratings
The Daniel Ratings have a couple flaws that make the results
unacceptable for official use (other than chosing BCS teams). There
is a serious flaw and a not-so-serious flaw. The not-so-serious flaw
is a dependence on sparsity, the serious flaw is sensitivity to
schedule structure.
Dependence on Sparsity
Imagine if this rating system is applied to Major League Baseball.
Major league baseball teams play 162 games, and play every team in
their league at least six times. (We'll disregard the interleague
games for the moment.) In most cases, every team would have a rating
of zero, because typically every team would have at least one victory
over and one loss to every other team. Or, if one team sweeps a
season series, it would be the only team with a positive rating.
It is obvious that the Daniel ratings become meaningless if there are
too many games. In other words, this system depends on sparse
data.
Of course, in college football, teams only play a dozen or so games,
so the requisite sparsity of the data is there. Nevertheless, I
wonder if there could be some residual effect. For example, most
teams play a conference schedule, where most teams play each other.
Does this distort the results any?
Having said that, I feel that college football would have to add a lot
of game before the results start to be meaningless, so I classify this
as a not-so-serious flaw.
Sensitivity to Schedule Structure
Differences in the schedule structure between two teams can lead to an
advantage for one of the teams.
- Number of Games
- An obvious flaw in the system is the fact that teams play
different numbers of games. Suppose two teams finish the season
undefeated, but one played only 11 games, whereas the other played
13. The second team got 1300 points for its undefeated schedule, while
the first got only 1100.
In many cases such as this, the solution is to just divide by the
number of games played. In this case, however, that won't work. The
Daniel Ratings are a point-to-point comparison between teams; every
team is compared to every other team, and these team comparisons are
summed to give the total rating. Thus, there is no "per game" rating,
and so dividing by the number of games is not appropriate.
- Conference vs. Nonconference
- A less obvious flaw can be seen by considering Florida State and
Notre Dame. Suppose they finish the season undefeated, each playing 11
games. Even though they play the same number of games, their schedules
appear quite different to the computer. The reason for this is that
Florida State plays in a conference, the ACC, while Notre Dame is
independent.
In the ACC, every team plays every team. So, if, say, Georgia Tech
beats everyone else in the conference, Florida State would not get any
half-credit rating points by virtue of Georgia Tech's victories,
because the Daniel Ratings only consider the shortest transtive
path. The shortest path from Florida State to every ACC team is 1, the
direct victory. So Florida State could never benefit from other ACC
teams beating each other.
On the other hand, suppose Georgia Tech is also on Notre Dame's
schedule, and Notre Dame beats them. Notre Dame will get all the
half-credit points from Georiga Tech's victories in the ACC. In fact,
because Notre Dame is an independent, many of the teams on their
schedule won't play each other. Thus, almost any win by Notre Dame's
opponents would result in half a rating point for Notre Dame.
By virtue of their schedule, Notre Dame has much more opportunity to
gain half-points by their opponents' victories than Florida State.
In some fairness, Notre Dame would also have more opportunity to be
ranked lower if they lose.
- Cliques
- A third flaw, related to the previous one, is that college
football teams tend to form cliques, which are a group of teams that
have relatively few games scheduled with other groups. (A conference
is one kind of clique, but there are larger cliques and smaller
cliques, too.)
When teams form a clique, it restricts the flow of information. There
are not so many games between teams of different cliques, so it is
hard to compare teams from different cliques. This is a problem for
any ratings system. However, this is a very hard problem for the
Daniel Ratings, because of its dependence on shortness of path. If
one clique is better than another clique, then the teams from the bad
clique still have a short path to the other teams from the bad clique,
whereas teams from the good clique have a long path to those teams.
If the bad clique has more teams, they might end up outranking teams
from the good clique.
Division I-A is a good example of a large clique. NCAA rules specify
that, in order for a team to play in a bowl, that team must have six
wins over Division I-A opponents (excepting one I-AA team every four
years or something like that). Because of this, I-A teams do not
often play I-AA teams, and as such, Division I-AA teams have a certain
shortness of path advantage over I-A teams. This caused some I-AA
teams to appear very high in the ratings before I decided to rank only
I-A.
Possible Solutions
Here are some ideas I have for solutions.
It seems that the fatal flaw in the Daniel Ratings system is that much
information is thrown out when the system decides to only account for
the Shortest Transitive Path. So, the obvious remedy is to consider
all paths.
The result of this line of thinking is perhaps a little obvious. If we
consider all paths, the system degenerates into an RPI-like system.
That is, it would be some number times your winning percentage, plus
some number times your opponents winning percentage, etc. It would be
slightly different than the RPI because it goes deeper than opponents'
opponents, and the numbers would be different.
A second idea is to consider only shortest paths, as before, but to
compensate for the hiding effect caused by selecting the shortest
path. For example, in the Florida State case with the ACC, we could
detect the fact that Florida State can't benefit from other ACC games,
and compensate for it.
The basic idea is to determine the total number of possible
shortest transitive paths (possible meaning that we assume that the
right teams win) one can take, and then giving the team rating points
based on the percentage of those paths that are actually are
transitive paths.
One Other Thought
Almost everything written in this section is a result of my intuition
about the ratings. With more careful study, I might discover
cancellation effects that make the flaws in the system not as severe
as I thoiught. Or I might uncover unforseen problems with my ideas
for solutions.
|