Why a Won/Lost Record is the Wrong Way to Judge a College Football Team


Mike Nemeth

(from Lies, Damned Lies and Statistics, coming in August 2018)

The College Football Playoff rankings may seem a trivial diversion as the weather transitions from the stifling heat of summer to the punishing cold of winter, but they change the course of history and impact the lives of millions of Americans. They are serious business. The chosen few (four teams) receive financial advantages, recruiting advantages, and reputational brand advantages, that improve their performance for years after their appearance in the playoff. The record books indelibly reflect the achievement of the chosen. And, every child’s memories of the playoff last a lifetime.

A committee of twelve men annually undertakes the solemn task of selecting the four teams that compete for the national championship. There are no published, hard-and-fast rules that these men are compelled to follow. There is no transparency surrounding their judgments for the fans who live and die with their votes as their votes are not made public.

What we know after four years of interpreting committee rankings is that just like the voting polls the committee was meant to replace, the committee establishes a pecking order, based upon won/lost records, in its first release of rankings and then follows a mechanical process of demoting losers and backfilling with winners until the final rankings are issued. The committee is loathe to demote a team that won, or promote a team that lost, based upon how well it played and how good its opponent was. That wouldn’t seem logical to people obsessed with won/lost records.

Occasionally the committee does tweak its rankings to position the order to be logical given possible future outcomes. In the rankings released on November 21, 2017 the committee moved Miami from No. 3 to No. 2 and demoted Clemson from No. 2 to No. 3 despite the fact that Miami played poorly in an ugly win over Virginia. This move seemed innocuous to the causal observer since the two teams would later meet in the ACC conference championship game, but actually it was a calculated chess move designed to ensure that the committee could justify keeping Miami in the playoff field even if it lost to Clemson. And that justification was necessary so that the committee could avoid the ignominy of voting a two-loss team into the playoff. Conspiracy theorists might argue that the move was designed to eliminate Alabama in the event of a loss to either Auburn or Georgia. SEC apologists would argue it was designed to keep the playoff field from containing two SEC teams.

Controversies about the committee rankings arise because the committee follows no published rules and is misled by won/lost records. In fact, everyone who has grown up inside the game shares the same misguided devotion to the won/lost record. People who’ve played the game have been conditioned to view their won/lost record as the ultimate evidence of their worth as athletes.

As the CFP committee selects teams for the playoff, the temptation to favor teams with better won/lost records, better head-to-head results, more good wins, fewer bad losses, or conference championships is irresistible. This year Miami was ranked No. 2 simply because it hadn’t been beaten until Pitt burst its balloon. Wisconsin was ranked No. 5 for the same reason. A more critical assessment would show that Miami played mediocre football all season while Wisconsin defeated a collection of inferior teams. Thus, there is a case to be made that the almighty won/lost record is the one stat that must be eliminated from any objective assessment of how good a football team really is.

“Win” and “lose” are like pass or fail grades on a test in school; they are binary summaries of complex events, and much information is lost in the summation. Take, for example, a situation in which four students score 99, 71, 69, and 29 on the same math test. If the test is graded on a pass/fail basis with a score of 70 being the minimum requirement to pass, two students fail the test and two students pass. As a result, the passing scores of 99 and 71 become equivalent and the failing scores of 69 and 29 become equivalent, but the gulf between the 71 and the 69 is as wide as the Grand Canyon.

Clearly these four students exhibit widely divergent degrees of mathematical competence (a quantitative measure), but that is not apparent with two “passes” and two “fails” (binary measures). “W” and “L” expose the same amount of information as “pass” and “fail,” and they also conceal the same amount of information. If we want to know how well these four students know math, we wouldn’t limit our investigation to whether they passed or failed the test. We would examine the actual test scores to obtain an accurate assessment. The same is true when comparing wins and losses for football teams.

Now imagine that each of these students took a slightly different test on the same day. That is what college football teams do every Saturday. We wouldn’t consider all passing grades to be equally valuable in ranking the students. We would immediately want to know the relative difficulty of each test and the precise numerical score earned by each student. Then we could compare results and make a judgment as to which students were better educated than others.

Imagine another scenario in which there are two slightly different tests of equal difficulty. Each test is taken by two students and the rule is that the student with the higher grade on each test is credited with a “pass” and the student with the lower grade on each test is punished with a “fail”. This circumstance happens to college football teams every Saturday in the fall. On the first test the students receive grades of 90 and 80, and the 90 grade is given a pass while the 80 grade is given a fail. On the second test, less well-educated students score 75 and 55. The student with the grade of 75 gets a pass and the student with the grade of 55 gets a fail. Reasonable people would not argue that the student with a grade of 75 was better than the student with the 80 simply because that student had won an arbitrary contest with a student who only managed a grade of 55. Yet, that’s exactly what football experts argue when they value the won/lost record over a scientific assessment (grading) of teams’ playing performances (numerical test grades).

The undisputable fact is that a team can play poorly and still win a football game (pass a test) if it plays marginally better than an opponent that played worse. We’ve all witnessed games in which a team won “ugly,” games in which neither team could “get anything going.” As binary statistics, won/lost records treat all wins equally, thus the team that wins “ugly” receives no less credit than the team that dazzles us with its scintillating performance. On its won/lost record, Oklahoma received no less credit for squeaking past weak Baylor and Kansas State teams than it did for outgunning ranked TCU and Oklahoma State teams.

Conversely, a team can play well and yet lose if its opponent plays marginally better. In its won/lost record, Ohio State received no more credit for its good play in a loss to an elite Oklahoma team than it did when it was demolished by unranked Iowa. The embarrassing outcome added just one “L” to the loss column for the Buckeyes and the committee was left to make a subjective judgment about the two games.

Unlike school tests, college football has no regulated dividing line between a winning grade and a losing grade. There is no established threshold to cross to earn a win. A team merely has to outplay its opponent. That means that winning teams in the rankings can play worse than losing teams in the rankings but receive more credit simply because they outplayed a specific opponent of unknown difficulty. On cupcake Saturday, November 18, 2017, Michigan played better football (received a better test score) in its loss to No. 5 Wisconsin than Miami did in its win over unranked Virginia, but Miami got promoted and Michigan was dumped from the rankings.

Rather than illuminate and enlighten, won/lost records obfuscate, conceal information, and deceive fans and experts alike. Records obscure qualitative measures in the same way that pass/fail grades concealed the difference between the passing 75 grade and the failing 80 grade in the example above.

Winning doesn’t automatically mean that a team is “good”; it simply means the winning team played relatively better than the losing team, on game day, under a certain set of circumstances. Therefore, winning and losing are relative measures and not definitive or decisive events. If the circumstances were changed or the teams played a second time, the outcome could well be different.

Won/lost records are merely the sum of pass/fail results for some number of tests without any qualifying information about the difficulty of the tests or the grades achieved on the tests. A student isn’t well educated simply because she passes tests; a student passes the tests because she is well educated. The same can be said of college football teams: A team isn’t good because it wins. A team wins because it is good.

In the analytics world, winning and losing are always outputs, but never inputs. Wins occur when a team earns a better grade than its opponent in a specific game. But, a winning grade must be compared against all other winning—and losing—grades to see where a team ranks relative to its competitors.

If the committee’s job is to select the four best teams for the playoff–the teams that are best at playing the game, not the teams with the shiniest records–then it needs a method to judge playing performance while ignoring won/lost records. In my book Lies, Damned Lies and Statistics, to be released by Morgan James Publishing on August 14, 2018 (in time for the next football season), I define how a numerical grade can be calculated to represent a team’s playing performance in any football game. I call the method the Relative Performance Grading system or RPG for short. The numerical grades produced by this system, the same as numerical grades on school tests, could replace won/lost records, which are the same as pass/fail grades on school tests, to produce accurate rankings of college football teams. That would mean that the best—most deserving—teams would be selected for the playoff. And, it would mean that every child would treasure the right enduring memories.

In the interim, you can find this season’s rankings at http://nemosnumbers.com/football-rankings/. There’s also a weekly commentary/explanation in my blog at the same site. You’ll find that the committee’s comparison of resumes (a euphemism for won/lost records) is far different from numerical grading.