Archive for the ‘Statistical Geekage’ Category

Deceptive Defenses

August 29, 2009

At its core, playing defense in football is simply a matter of preventing the other team from moving the ball, right? I mean, if a team can consistently stop its opponents from advancing down the field, chances are it won’t be giving up many points.

Seems intuitive. Being the ever-inquisitive sports fan that I am, though, Homerism decided to test this assumption out. I crunched some numbers from 2007 and 2008, comparing every D-I teams’ yards allowed per play (YPP) with both points allowed per play (PPP) and points allowed per game (PPG). Sorry in advance for the coming geekage.

(As a side note, I tend to look at points allowed per play as a better indicator of defensive strength than points allowed per game, as PPP somewhat “washes out” the impact of factors such as offensive and defensive tempo.)

As expected, YPP demonstrated strong relationships with both PPP and PPG for both years. In 2008, YPP and PPP had a correlation of 0.91, while YPP and PPG had a correlation of 0.89. In 2007, both measures of correlation were 0.89.

(Click 2007 and 2008 to access the relevant data from each year.)

A regression analysis of the data yielded similar sets of equations that could be used to estimate PPP and PPG using YPP.

  • 2007 PPP = (0.115)*YPP – 0.239
  • 2007 PPG = (9.182)*YPP – 21.971
  • 2008 PPP = (0.131)*YPP – 0.319
  • 2008 PPG = (9.390)*YPP – 24.458

For example, Wake Forest allowed 4.6 yards per play in 2008, which yields expected values of 18.74 for PPG and 0.28 for PPP.

All of these equations demonstrated a coefficient of determination of roughly 0.80. As such, they can be used to explain about 80 percent of the variation in PPG or PPP in both years. Not bad.

In reality, the Demon Deacons gave up 18.3 PPG and 0.3 PPP in 2008, which are pretty close to what the models predicted. But what about the teams whose actual performances diverged significantly from what the models projected? We can group these teams into two opposing categories: “overperformers,” who actually allowed fewer points than expected, and “underperformers,” who gave up more than they should have.

Remember that the formula is used to project what should have happened in 2008. The projected points measures for each season depend on yards per play for that season only, and thus offer little in the way of predictive value for the following season, assuming YPP changes from year to year. In other words, the model can explain past performance and identify outliers relatively well, but it says little about what a team will do in the future.

However, it would be reasonable to expect “regression to the mean” in terms of conforming to the model in the following year. As such, if a team showed a substantial difference between its actual performance and its predicted performance, that gap would be expected to reverse itself and shrink the following year.

For instance, consider Cincinnati. The Bearcats’ spread between actual and predicted PPP in 2007 measured 0.083, which is equivalent to 2.30 standard deviations. The following season, that spread would be expected to shrink, which, in fact, it did. The Bearcats’ actual and predicted PPP in 2008 differed by 0.003, or 0.06 standard deviations.
What factors contribute to underperforming and overperforming on defense? The truth is that I have no idea.