# QB Performance Frequency Distributions

Ok, so a couple of weeks ago I posted about how often great QBs have bad performances. Today, let’s take a more detailed look at things.  The overall question is, are quarterbacks equally likely to outperform or underperform their long-term average QB rating in any individual game?  Stated differently, are the individual performance distributions symmetrical?  Are they Normal?

Let’s start at the top.  Here’s the graph for Peyton Manning.  Note that the X-Axis labels show the UPPER Bound of each bar.  So the “60” label means that bar corresponds to games where the player’s rating was between 50 and 60.  Also, this is all games with at least 10 pass attempts.  Remember that these are NOT weighted numbers, so they’ll be different from the career measures for each player.  This helps to minimize the skew effects of games with a lot of pass attempts (garbage time yards in a blowout loss for example) as well as increase the weight of great games with relatively few attempts (when a team has a big lead early perhaps).  Realistically, we just want to know what level of performance we’re likely to get in the NEXT SINGLE game.

That looks pretty Normal.  The Mean is 97.45 (note that this is NOT his long-term average, since it’s not weighted by attempts).  The Median is 95.6.  Obviously, those are crazy-good numbers.  That’s why he’s a HOFer.

Big-picture, if players generally follow a Normal distribution, then we can tell a lot about Nick Foles from relatively fewer games.  So Peyton’s chart is really encouraging.

But here’s Drew Brees:

Not nearly as neat as Peyton’s.  The Mean is 95.17, the median is 92.4, and there’s some clear skew to the distribution.  Going back to the last post (linked above), notice that Brees has had more games with ratings between 60-80 than he has games with ratings between 80-100.  Overall, his performance, though still amazing, is less predictable than Manning’s.  The standard deviation of Brees’ game log is 29 (rounded) while Manning’s is 27.  Illustrated differently, we can look at the range covering the middle 50% of performances:

Manning:  79.9-112.2

Brees:  72.8-116.7

Again, both great, but Brees is less predictable.  We could raise some interesting strategic questions here as far as which one you want in which situation, but I’ll save that for another day.  For now, just imagine tying it back to our “David Equation” (that’s what I’m calling it now).  Brees might not be as good, but his higher-variance play might be preferable for an underdog team, while you’d rather have Manning if you’re the favorite.

Now let’s take a step down and look at some non-future-HOFers.

Much uglier, as expected.  The mean is 80.48, the median is 81.2.  The standard deviation is actually much lower than either Manning or Brees, at just 21. His middle-50% range is:

66.3 – 91.3.

That’s what a bad QB looks like.  Now we should probably caveat all of this by saying it’s a bit unfair to evaluate the QBs in a vacuum, with no regard to the talent level they’re working with.  That’s the case with just about every NFL evaluation, it’s just the nature of the game.  Still, Bradford hasn’t been good enough, and I’m skeptical he ever will be…

How about Eli? (You knew I couldn’t leave him out)

Mean of 82, Median of 81.65.  Standard deviation of 27.3.  Hmmm…those numbers look vaguely familiar.  What were Sam Bradford’s again?  (mean of 80.48, median of 81.2, stdev of 21).

Interesting.  Future HOFer Eli Manning’s average a median performance are almost identical to secret-bust Sam Bradford (secret because nobody seems willing to say it outright.  I will, Bradford is a bust, unequivocally.  Closing in on 2000 pass attempts, he has a com % below 60 and a Rating below 80).

What about Eli’s middle-50 range?  63.6 – 100.7.

That’s just about the definition of mediocre (and maybe even a bit worse, we’ll see later).

Before I move on, let me repeat one thing.

Eli Manning’s MEDIAN performance is a rating of 81.65.  So HALF of his starts are WORSE than that.

Going back to the initial question I posed:  Are individual QB performance distributions symmetrical?  Almost, though we have to grade it as inconclusive since I only looked at a few QBs.  So we might be able to use that to infer some info about Foles.  Clearly, though, they’re not Normal… There’s a lot more we can do with this type of data, but I’m going to have to wait for another day to start on it.

Also, as is usually the case, I think I stumbled onto something more interesting (the middle-50 ranges).  So rather than go through each QB and post a chart of they’re distributions, I’m going to end this post now and start making a table of every starting QB’s Middle-50 range.  Then we can start talking about which is “best” in a given situation, with some real data to go from.

Before I go, here’s Nick Foles.  He has just 11 qualifying games, so small sample is an understatement, but it’s fun to look anyway.

Average is 97.2, median is 96.6.  Standard deviation is 35.8.

And here’s Mike Vick.

Average is 80.9, Median is 83.6.  That’s 2 points BETTER than Eli’s median performance…. Vick’s standard deviation is 28.6.

Happy Thanksgiving

# Tracking QB Rating over time

Limited commentary with this post, because it’s somewhat self-explanatory.  I decided to go back and calculate an on-going QB Rating for several prominent players to see how each would have looked, statistically, after every appearance made.  Obviously, this relates to Nick Foles.  Basically, I cannot think of a single QB who has had a start to his career anywhere near as good as Foles has (statistically) and then NOT gone on to have a good career.  However, I figured it would be instructive to look at some successful QBs and see how they progressed over time.  Below is the chart.

Before we get there, though, a PSA: The NFL QB Rating calculation, while useful, is very convoluted.  Almost nobody knows it or has even seen it, so for your education, here it is (from Wikipedia):

Fun stuff, eh?

Anyway, back to the subject.  Here’s the graph, I suggest you click to enlarge:

I only included the first 100 appearances (NOT starts) for each player.  Note that this is an ongoing calculation, so it becomes less susceptible to change with every appearance.  I’m hesitant to draw any conclusions from such a small sample size, but including other QBs (I originally included Vick, Brady, Brees, and Josh Freeman) makes the graph too hard to read.

In any case, I’m going to build the sample from here to see if we can divine a good idea of when, regarding Foles, we can exhale.  For now, just take note of how “poorly” each of the included QBs played to start their careers.  And because you knew I would, I included Foles below, along with Brady and Brees.  Notice that none of these guys had as high a QB rating after 14 appearances as Foles does.  Moreover, among these guys, Foles tracks most closely with…Tom Brady.   I know attempts would be a more helpful measure, but I don’t have data for that, so…this:

Update: Should have noted this originally, but the chart doesn’t account for the general offensive inflation the league has seen since guys like Peyton started their careers. Regardless, the adjustment would only involve shifting Foles’ line down slightly, and would’t significantly change how things look.  The trend, obviously, would remain the same.

# How often do good QBs have bad games?

Another game, another great statistical performance from Nick Foles.  This one was certainly luckier than some of his previous gems, but that’s hardly a good reason for writing it off completely.  While thinking through his performance, I decided to look at the topic referenced in the title to this post.  Before that, some quick notes on Foles:

– With each game, Foles’ sample size grows, and the odds of him being a “fluke” decline.  As promised, here’s his updated QB Rating by game chart:

– Eagles fans might not fully appreciate this, but having a QB whose “bad” games don’t involve a lot of turnovers is not a bad place to be.  McNabb never turned the ball over, but perhaps Vick has given you some perspective.  Most “bad” QB games are a lot uglier than we’ve seen from Foles.

– Going back to our strategic equation (E = R ((60 – T) / 60) + C), and applying it to yesterday’s game, we can see that Foles’ relatively underwhelming performance might have been the result of very rational decision-making.  When Seneca Wallace left the game, leaving Scott Tolzien to lead the Packers, the Relative Strength swung largely towards the Eagles.  As a result, they became the strong favorite, meaning they should be playing a LOW-variance game.  For Foles, this meant avoiding turnovers at all costs.  I don’t know whether this was actually the case or not (probably not), but Foles would have been completely justified in not throwing to WRs unless they were VERY open.  After grabbing a lead, it became somewhat clear that the best chance for the Packers to win lay in the Eagles turning the ball over.  If Foles/Kelly made a conscious decision to go with low-risk plays, then the result would look underwhelming but, in fact, be the right strategic call.

– Of course, Foles DID throw one ball into coverage, but it was on a deep throw.  While I have not confirmed on the replay, my initial take on the play was somewhat different from most others.  In my view, he did NOT throw “into double-coverage”.  He under-threw D-Jax, the result of which gave a second defender time to get into the play.  That seems like a slight difference, but it’s important.  There are two aspects to the play:

1) the decision to throw the ball (mental)

2) the actual throw (physical)

I think Foles got the first half 100% correct, he just messed up the execution.  In general, a slightly under thrown deep-ball is a lot more defensible than a decision to throw into actual double-coverage.

Ok, enough about Foles.  In thinking through the odds/fluke/sample-size piece, I decided it would be interesting to provide some context to the QB expectations discussion.  I did a little of this when I talked about McNabb’s HOF credentials, but today I’ll do it in more detail.

The overall point is that most fans (I think), overrate the consistency of “great” QBs and set their expectations for the position too high.  Great QBs have bad games, and as we’ll see (I hope), they happen more often than you’d think.

Setting it up

First we need to define our parameters.  I’m going to use QB Rating, with all relevant caveats acknowledged.  For our categories, I’m using the following:

Great: 105+

Good: 95-104

Decent: 85-94

Poor: 75-84

Below are charts for a selection of QBs.  I only included starts with at least 10 pass attempts.  Sorry for the blurriness, click for a clearer picture.

Take a good look, cause there’s a lot of good information in there.  Most important, of course, is the frequency of “bad” games (rating of worse than 75).

– Tom Brady has a “bad” game roughly 25% of the time.

– Drew Brees has a “bad” game nearly 1/3 of the time.

– Even Peyton Manning doesn’t crack a rating of 75 in almost 20% of his starts.

That doesn’t even include the incidence of “poor” games either.  When you add that in, you can start to see the point I’m trying to make.  Even the best QBs in the league have what fans would consider a bad game fairly often.  Now they do, obviously, provide a high rate of “great” games as well, that’s what makes them “great quarterbacks”.

A couple more notes:

– Tony Romo compares favorably too both Drew Brees and Tom Brady (commence vomiting now)

– FutureHallOfFamer Eli Manning’s starts have resulted in either “bad” or “poor” performances more than half the time (54%).  Let that sink in….if you randomly picked a game from Eli’s career, you’re more likely to get a poor/bad game than a decent/good/great game.  It’s not too late to kill his HOF candidacy…

So next time Nick Foles has a bad game (like that’s every going to happen again), remember this post.  Even the greatest QBs have bad games, and it happens more often than you’d think.

# Nick Foles: Information assimilation

I was hoping to make this post about the equation I drew up earlier, particularly how to think about assimilating new information into the model.  Given this week’s Eagles scenario, though, I think it’ll be helpful to apply similar thinking to a real world example first, then go back to the model.

The “real world” example is, of course, Nick Foles.

His last start was terrible.  He looked completely overmatched and showed none of the strengths we had previously seen him use (pocket presence and accuracy in particular).  That’s not really up for debate.  However, there’s a BIG difference between knowing that information and using that information.  The issue is, what does this performance tell us about Nick Foles’ skill/ability in general?

To answer that with any confidence, we have to frame it correctly.  That means using ALL of our information, not just last game.  For example, here’s a chart that shows Nick Foles’ passer rating by game.  I’ve only included games in which he had at least 10 pass attempts.

What do we see?

– Well most shocking to me is that the Cowboys game wasn’t actually Foles’ worst game (by passer rating).

– The sample size (look at the X-axis labels) is still very small.  He’s only seen significant time in 10 games.  That means, regardless of what you think his performance to date says, you shouldn’t have that much confidence in it.

When put in context, last week looks like an extreme outlier.  This is why it’s important to view everything together, rather than focus on one particular event.

Insert Passer Rating disclaimer here.

So we’ve got a general idea of the larger picture.  Now let’s take a sightly different approach.  First, let’s use what we KNEW about Foles BEFORE the Cowboys game to see just how likely the Cowboys result was.  Caution: Extremely over-simplified statistical analysis here.  It’s illustrative, not definitive.  There are definitely some more robust tools we can apply and some data adjustments we can make to increase confidence, but that’s a post for another day.  Apologies in advance to the statisticians out there.

Prior to that game, Foles career Passer Rating was 87.13.  Let’s assume, for a moment, that 87.13 is Foles’ “true” ability.  It probably isn’t (small sample), but IF IT IS, we want to know how likely the Cowboys game result was.  To do that, we’ll need not just his rating, but the standard deviation as well, which for Foles, was 27.03.

Given those to pieces of information, and assuming a Normal distribution (not necessarily a safe assumption) of potential outcomes, we can calculate what the odds were of Foles performing that badly.  Now, this doesn’t account for defensive strength, but I’m trying to keep things relatively simple.

In a Normal distribution, roughly 68% of the data will fall within one standard deviation of the mean.

Here, the mean is 87.13, so we would expect, if that’s Foles “true” ability, that 68% of his starts would result in a Passer Rating between 60 and 114 (rounded).

Now, Foles ended the Cowboys game with a Rating of 46.2, which is roughly 1.5 standard deviations (27.03) away from the mean (87.13).  1.5 is the Z-Score.

Cutting to the chase, that tells us that, given our assumptions, the odds of Foles performing that poorly against the Cowboys was just 6.5%.

We can’t stop there, of course.  Just getting the result isn’t enough.  We then have to go back and view the observed outcome in light of its probability and our assumptions.  Basically, there are two ways of viewing this:

– Foles performance was a result of random chance.  Given what we knew, he had a 6.5% chance of playing that poorly, and it just happened to hit.

OR

– Our initial assumption was wrong.  Foles’ “true” Rating is worse than 87, which means the likelihood of him playing that poorly was actually higher (potentially much higher) than 6.5%.  This one is attractive, because it let’s us increase the odds of occurrence for the event we witnessed.

I’m not yet ready to answer that question, but this is the real crux of the post, and the overall point I am trying to make.  If you actually want to KNOW what’s going on, you have to examine all of the information, and try to reconcile it.  Most fans, of course, have no interest in doing this.  They’ll trust their “gut”, which usually results in them using the most recent event and discarding almost everything else.  Think about the euphoria after the Bucs game.  If you go back to the chart at the top of the post, it’s clear that was an outlier as well, though it was part of the overall uptrend and not as serious as the Cowboys game.

A big part of objective analysis is accepting that there’s usually a real chance that you’re wrong.  In this example, we can view this from both sides.  Foles supporters, while pointing to the “bad luck” explanation, HAVE to accept that fact that the second explanation, “bad Foles”, may be true.  Similarly, Foles detractors can point to the “bad Foles” explanation and use the Cowboys game as proof that the Foles supporter were wrong.  However, they too must recognize the potential validity of the alternative explanation.  It’s possible (6.5% in our basic analysis), that it was simply a bad game.

Reconciling those two sides is mandatory for anyone try to learn the truth.  As I tried to explain in the short disclaimer above, the example I used was extremely simplistic.  It doesn’t account for things like quality of competition, potential for improvement, etc.   There’s also the Normal distribution assumption and the small sample issue.  There are things we can do to address a lot of those problems, I just didn’t have time to do it all for today’s post.

Going back to the larger topic of Information Assimilation, hopefully you can see how this type of analysis can be applied to the R value in our equation.  And for anyone still skeptical as to the applicability of that model, I give you this:

Look at the score by quarter and tell me E = R ((60 – T) / 60) + C isn’t important.

# Momentum: Yes it’s real…but that’s no excuse

Before I get to today’s topic, Momentum, I wanted to note that yesterday’s post, The Hot Seat Index, has now been updated.  There was a flaw in the win change column of the table that was helpfully pointed out by a commenter.  It’s now fixed.  The results aren’t dramatically different, but scroll down to see the update if you want.

Now…

Everybody who watches the NFL (or any sport for that matter) is familiar with the term “momentum”.  It’s used very often by commentators and announcers to describe the ebb and flow of the game.  More importantly, it’s dismissed and derided by the “analytic” community.  On the surface, it’s a clear front in the battle between “old school” and “new school” fans/analysts/etc…  Now I want to weigh in.

As you may have predicted, at a high level, I agree with the “new school”.  However, I think a lot of the members of the side of the discussion, whether through inattention or ignorance, aren’t characterizing Momentum correctly.

Momentum absolutely exists.

I have no doubts about that.  So why do I still agree with the analytic community over momentum’s relative worthlessness?

Well first, let me define momentum exactly as I see it.  When we discuss Momentum, we’re essentially saying that the events of the game have unfolded such that a player’s expected performance distribution fundamentally changes.  Some combination of pressure, confidence, attitude, etc…, supposedly diminishes the expected performance of the players.

I’m willing to admit that it’s possible for a player to underperform his true ability based on one or more of these factors.  However, why is the converse not possible?  Given a host of different stimuli, we can find people who react to said stimuli in contrary ways.  Just as not “having the momentum” can diminish performance in some players, it seems logical that it can also INCREASE performance among other players.  For example, can you think of any athletes that seem to play BETTER when they are losing by a lot or when the other team seems to “have” the momentum?  If so, we have a problem.  In order to assert Momentum, you’d have to accurately balance the players who play worse against those who play better.

More importantly, in football, there are a LOT of players on the field at once.  Even if we knew how one player would react, it wouldn’t tell us much about the game unless we knew how the OTHER 21 players on the field reacted.

That’s a long way of saying that, even if Momentum is real, it is NOT knowable.  We can’t even agree on what factors play a role, let alone measure them.  If a factor is not measurable, or even theoretically knowable, is it of any actual value?

Let’s equate it to Luck.  Clearly, luck is real and plays a very real role in the outcome of NFL games.  However, luck, as I’m thinking of it, is also UNKNOWABLE.  We don’t know how the ball will bounce once it’s fumbled.  We don’t know if a sudden wind gust will blow a field goal off course.  So we have a similar knowledge of luck and momentum.

However, commentators appear to think we KNOW things about momentum that we can’t possible know.  Let’s play a game.  I’m going to list a common phrase, then we’ll do a little variable replacement.

– “Team A really needs to score to shift the momentum”

Hear it all the time.  As we’ve just discussed, we have just as much knowledge about momentum as we do about luck, so what happens when we use that equivalence to rewrite the sentence above?

– “Team A really needs to score to shift the luck”

Sounds ridiculous, right?  Like, completely outrageous and anyone who said it on the air would be ridiculed mercilessly.

So why are we so tolerant when commentators use “momentum”?  As I tried to explain, we don’t really know anymore about the effects, conditional requirements, or significance of momentum, so isn’t it ridiculous to assert it as a goal?

– “Team B really has the momentum on their side now!”

We have no idea what that means!  We CAN’T know what that means, because momentum is made up of an extremely large number of unquantifiable variables, the effects of which are unclear even if we DID know how to measure them.  It’s as ridiculous as saying:

– “Team B really has the luck on their side now!”

It doesn’t make any sense, and it’s worthless as an explanatory phrase.

Now, this doesn’t mean that “momentum” should be stricken from the vocabulary of every announcer.  Just as there are very appropriate uses of “luck” while describing the game, there are potentially valid uses of “momentum”.

The problem occurs when announcers and fans start using MOMENTUM as a justification for play-calling or as a goal itself.  For example, going for it on fourth down because you “had” the momentum.  It’s likely that going for it was the right move (see previous posts), but that justification is ridiculous.   There’s just absolutely no way of knowing if the particular circumstances at any point in time qualify as “Momentum” or if that actually means that your players are more likely to perform or the other team is any less likely to perform.

I don’t think most announcers think about these things when they use the phrase.  It’s, unfortunately, a descriptive crutch.  It’s just another way of saying “Team A has made a lot of good plays in close succession”.   That in itself isn’t a huge offense, but it’s frequently used to justify some assertion that Team A is then MORE LIKELY to be successful on plays until the momentum changes again.

That of course is ridiculous, and why most “analytic” minded fans and commentators are so dismissive of the concept in general.  If it ended at a description of what HAS happened rather than support for what WILL happen, it would’t be nearly as annoying.

So…the final point:

Momentum is very real.  However, it’s not quantifiable or knowable, and therefore is completely useless to us in terms of advancing our understanding of the NFL or sports in general.

# The Hot Seat Index: Predicting NFL Coaching Terminations

Time for another guest post from Jared.  You may remember him from:

– The Fourth Down Decision Series and Cheat Sheet (Part 1 of 4)

– When does it make sense to return a kick?

Well today, he’s ready to unveil at least the first iteration of his Hot Seat Index, and attempt to predict which coaches will be fired at or by the end of the season.  You can follow him @jaredscohen and at kebertxela.blogspot.com.  Without further ado:

————

So the Eagles had another especially depressing loss to the Giants this week. And with such a depressing loss, it got me thinking of another depressing outcome, watching your team stink to the point where they fire their coach.

So, this was something I wanted to wait on until much later in the season, but with Greg Schiano working as hard as he can to get fired as soon as possible, I needed to put it out a bit early.

One of the things I hate hearing about as the NFL season progresses is all the discussion around the coaching ‘hot seat’. It’s not that I mind the speculation, but all of the discussion centers on conjecture and the occasional ‘anonymous source’ (the exception being Schiano where the sources have noticeably opted to forgo anonymity!)

As I was thinking about it, I really wanted to see if we could make this a bit more objective and data driven. So with that as the goal, I gathered some data, did some analysis, and am now ready to introduce the NFL Coaching Hot Seat Index!

The Coaching Hot Seat Index is a model that, at any point in the season, will give you the approximate odds of an NFL coach getting fired after the season!

It’s based on a collection of data on all NFL coaching seasons since 1980. I broke the data down and for each coaching season, identified whether a coach was fired or kept their job (with some adjustments for retirements etc.)

With over 900 observations (and ~200 firings!), I started looking for factors which were significant in predicting whether coaches got fired or not. I tried lots of things, number of wins, playoff appearances, having the last name ‘Kotite”, all to see what was really significant in predicting when a coach will get fired.

I ended up with two primary factors, which were by far the most significant in predicting a coach getting kicked to the curb.

– Total point differential (points scored-points given up)
– Change in team wins from prior season (current year wins-prior year wins)

Those shouldn’t come as a huge surprise. (Note: I’m disappointed because I couldn’t examine another factor I wanted to see, the difference between Expected Wins at the Season Start and Actual Wins at season’s end…I wanted to use gambling lines to get at it but didn’t have nearly enough data to look at it. I still think it would be a more compelling variable)

But anyway, with those two factors, and using (or, abusing) a technique called logistic regression, I arrived at an equation to give us the odds a coach will get fired. Logistic regression is basically something you can use to help predict the likelihood of a binary outcome (a coach either gets fired, or he doesn’t) based on some variables (in this case, his team’s point differential and win change from prior season).

In the end, the result is the percentage chance (out of 100%) a coach will be fired after the season. The higher the odds, the more likely the coach is going to be filing some unemployment papers right around the Pro Bowl.

But I couldn’t just create a model and throw it out there without testing it a little. So I created a first draft of it, using only data from 1980-2011, and used that model to ‘predict’ the 2012 season based on its point differential and change in team wins (which, obviously, I already knew). When that passed the sanity check, I updated the model for 2012 and used it on teams from this year.

Those results are included below, with coaches sorted by their ‘odds to be fired’

Not a bad first result. Romeo Crennel, Mike Mularkey, Andy Reid all got canned, and those were the coaches with the highest likelihood. The model obviously isn’t perfect, as the plenty of coaches can still get fired, but at least there’s a structure and some logic here (we can also rank the coaches from safest to least safe!!!)

So this is interesting, and now we can apply it to the current season, and see what 2013 coaches are most likely to be fired!!!

Now, I had to make a couple of assumptions, because this model is based on a full season of performance, which of course we won’t have. But we do have a good sample and some reasonable projections of performance we can use as proxies.

For point differential, we can use a teams’ current point differential and pro rate it out for 16 games. This doesn’t account for changes in either a team’s performance or strength of schedule, but it’s not unreasonable.

For win difference from last year, we can take an estimate of the team’s full season (which I’ve borrowed from Football Outsiders Playoff Odds report, which runs simulations to calculate average team wins) and then just check the difference from 2012 wins.

So, at this point in the season (Week 7, because Week 8 is going on as I write this), which coaches have the highest odds of getting fired?

Well – it’s no surprise to see the Jaguars on the list, but we may have to make an exception for first year coaches as they typically get more than one season to right the ship. We’ll give Gus Bradley a pass (although no one can argue their performance is historically bad…it’s no wonder the odds are higher than anything seen in 2012)

Tom Coughlin checks in at number 2, although we’ll unfortunately need to update that after the Eagles managed to lose to them this week. Their performance has been really bad as well, and any coach without Super Bowl rings would likely be on their way out. But Coughlin may overcome the odds with all the goodwill he’s built over the years (or maybe he changes his mind and “retires”)

Next comes Schiano, where I think we can all agree the odds calculation actually UNDERRATES his odds of being let go. This guy might want to think about booking some tee times in late November if he continues at his current pace.

But the next coaches on the hot seat, Gary Kubiak, Leslie Frazier, and Mike Shanahan, should also probably get their acts together if they want to stay employed.

From an Eagles perspective, Chip Kelly doesn’t seem to be in too much danger (although as I said, we probably should eliminate all first year coaches as a general rule). Of course, this assumes Kelly behaves competently, unlike the absolute sh*t show we just saw against the Giants (which my brother is probably already dissecting)

Of course, from a FORMER Eagles perspective, it looks like Andy Reid has done quite alright for himself in the move to Kansas City.

I’m just saying.

# Expected Points: Providing Context

This is perhaps only indirectly related to the strategic post I’ve been looking to expand upon, but it’s important nonetheless, and, in my mind, more immediately relevant.  The topic, as the title suggest, is Expected Points, the concept developed by Brian Burke at AdvancedNFLStats.com.

I’ll spare you the full explanation because I think most readers here are aware of it, but basically it assigns a point value to each down/distance combination to provide a measure of how valuable each situation is.  For example, 1st and 10 at the 50 yard line is worth MORE than 1st and 10 at a team’s own 20 yard line.  Simple enough.  However, many people (myself included) have been a bit to cavalier in using the Expected Points concept to evaluate in-game strategy.

Today, I want to clear that up.  There is a major limitation to the Expected Points concept; that is, it’s an AVERAGE, and it doesn’t account for the relative strengths and weaknesses of each team.  Logically, a certain field position is worth more to the Broncos than it is to the Buccaneers, right?  Similarly, a particular field position is worth LESS against the Chiefs defense than it is against the Eagles.  Unfortunately, that’s not accounted for, limiting the usefulness of the Expected Points analysis.

I’m not saying it’s worthless, far from it in fact; I’m just saying that we need to remember that the EP analysis for a given situation reflects average teams, and therefore must be adjusted when applying it to real-world situations.

Allow me to demonstrate (as you knew I would).

Here is the chart of EP value for 1st downs.

The problem, as I mentioned above, is adjusting for relative strength.  To explain, we need to pick 2 teams.  For the sake of clarity, and consistency, we’ll use Denver and Jacksonville (FO’s best and worst teams by DVOA).

Let’s just look at Denver’s offense.  To adjust, we need to know how the Broncos offense compares to average.  Luckily enough, Football Outsiders provides us with a measure of just that.  Note that, for now, I’m knowingly glossing over the fact that FO’s DVOA might not be the best measure here.

Anyway, Denver, according to FO, ranks 40.9% better than league average.  Well that makes things easy, right?  All we have to do is increase the average EP value of each yard line and re-graph.  Well here’s that graph:

I’m guessing you all see the problem, but if not, hang in there, I’ll get to that in a second.  We ALSO have to account for Jacksonville’s relative strength/weakness on defense.   Just as I did above, I can just use FO’s rating (17.9%) to adjust again, right?

Well here it is:

Perfect….now we have the adjusted EP value for the Broncos versus the Jaguars.  We can do the same calculations as before, using these values, to determine the “optimal” play-call (where optimal means maximizing expected points).

Only…there’s still that problem I mentioned above, which by now EVERYONE has noticed.  Looking at the graph above, it’s pretty clear that we’ve made a mistake in our analysis.  Touchdowns are only worth 6 points, which means no field position can be worth more than that (violated above).  Moreover, since, regardless of relative strength, scoring a TD can never be 100% assured, we shouldn’t even see a value of 6 anywhere on the chart.

Basically, as the mismatch becomes more and more favorable to the offense, the line should approach a limit of 6, but never reach it.  Unfortunately for us, that complicates our plans.  How do we account for this?

I’m not sure, but I do have one potential out.

We can ignore everything I did above (ugh) and go back to using the average values for EP.  Rather than accounting for the relative strength here, we can instead adjust the expected success rates to account for relative strength.  That raises it’s own issues, but it seems to be more intuitive.

HOWEVER, that only addresses the problem when we’re using a combination of EP and Success Rates to game out the Optimal Value of a certain situation.  It does not address the issue when we are only using EP (which is how many analysts are using it.)

That brings me back to the main point: be careful when using Expected Points to justify in-game strategic decisions.  It can be done (and EP remains the best model for quantifying such situations), but you have to remember that the actual results will be distorted by the relative strength of the teams.

If that reminds you of this E = R ((60 – T) / 60) + C then I’ve done my job.  I apologize for the slap-dash way I’m addressing the overall concept, but it is what it is…I don’t have the luxury of taking the time to plan these things in advance.

I’ll try to return to this soon, hopefully using a real situation from Sunday to game out the options using our adjusted success rates.

Next up, though, we need to talk about Bayes and reconciling New Information (what happens during the game) with what we already “knew” (R before the game starts).

Eventually, I do believe it will be possible to create an algorithm that essentially tells you what the “optimal” decision is in every Go/Kick/Punt situation.  We’re kind of there already, on an average basis.  The key, of course, is to adjust for the teams involved in order to allow actually reliance and use (not that NFL coaches would ever admit that a computer can make better decisions than they can).  At the very least, it’ll provide a valuable guide with which to grade coaches.

This should probably be a permanent end note, but all comments are encouraged.  As I said, I’m trying to develop a useful model here, so suggestions are always welcome.

# Not All Points Are Created Equal: Part 2

I want to start looking at this in smaller chunks, which will hopefully be a little clearer.  First, some overall major takeaways:

– E = R ((60 – T) / 60) + C, the formula, for reference.

– Large underdogs should be extremely aggressive early in games, when R (relative strength) is at its largest.

Underdogs should attempt to use as much of the clock as possible.  This is more “conventional” and something I didn’t talk about last week, but it’s a logical extension of what I was talking about.  If you have two very mismatched teams, and make them play 100 games, it’s almost certain that the “better” team will win more than it will lose.  The larger the sample, the more likely it is to reflect that actual “relative strength”.  By using up a clock, the underdog is limiting the sample size “# of plays” from which the relative strength advantage can play out.  Using our formula, by bleeding the clock, underdogs are attacking the R value indirectly, using T, instead of going at R itself (scoring points).

– During the game, strategic decisions should incorporate an objective view of how the rest of the game is likely to play out.  For large underdogs, this means they should expect to be outscored, and therefore need to be aggressive in scoring points.

Favorites Strategy

I didn’t discuss how this effects the strategy of the Favorite.  In the most simple reading, it can be assumed that the Favorite should be more “conservative”.  Going back to our Broncos vs. Jaguars example, 3 points is a lot more valuable to Denver than it is to Jacksonville (hence “not all points are equal”).  Therefore, given the same FG opportunity as the one we gave Jacksonville (expected points for FG and going for it are equal, purely a risk/reward play), it should elect the LOW risk option (the FG).

That’s because, as I explained above, at any time T, the favorite can expect to outplay the underdog over the rest of the game, i.e. the R value is advantagous.  As time goes on, this becomes less of a factor (T declining ultimately takes the R half of the equation to 0).

In general, I agree with this.  Large favorites should be content to take whatever points they can get, early in the game.

However, there is a slight wrinkle, one that will appeal to the more aggressive fans.  Let’s go back to our graphic for a moment.  Here is the range of outcomes at the start of the game:

As you can see/remember, if we assume a “random walk” from there, Denver should expect to win a very large percentage of the time.  There is a case to be made for being aggressive, though, and hopefully you can see it.

It goes back to when I explained that you can actually “win” the game before the game is over.  Assume the same EP-Neutral opportunity above, but this time imagine that gaining 7 points is enough to shift the range of outcomes (yellow shade) entirely above the X-Axis.  Would you go for it or kick?  Probably go for it, right?  After all, if you have a chance to “win the game”, with relatively low risk (still have a heavy advantage if you don’t convert), you should do it.

Obviously, I have to note that this is a purely theoretical situation.  During the game, it’s not possible to know EXACTLY where the range of expected outcomes lies.  Therefore, we can’t be sure of where the line between 100% win and 99% win is, even if some of us see that final 1% as extremely valuable.

Still, it implies that there are some situations, even if they are hard to identify, where the Favorite should also be aggressive. In general, though, it should take the lower risk strategic options, because it does NOT want to significantly shift R (outside of the specific scenario I just outlined).

Random Walk

I don’t think I made a big enough point of this model in the post last week.  There are two ways to view the game, ex-ante, and I think one of them is much better than the other.

1) This is the normal model.  Teams start on even ground (Score tied 0 – 0) and we “expect” the course of play to naturally favor one team (the favorite) over the other (the underdog).  During actual play, we project that the difference in skill will gradually manifest itself in the score, and ultimately mean victory for the Favorite.  That’s the usual way of thinking about it.

What I’ve done is to flip that around a bit.

2) Teams start on UNEVEN ground (R value), and from there we expect a random series of events to occur, though they will be within a range of possible outcomes.  This certainly isn’t the “natural” way of thinking about things, but it appeals to me for one very big reason.  Can you guess what that is?

I like it because it forces us to accept and recognize the large role of luck and chance in the outcome of the game.  Future human events are inherently unpredictable, right?  So how do we reconcile that with the first option I outlined above (the normal model)?  Isn’t it explicitly forcing us to predict that which is, by its very nature, unpredictable?

The result of this is that we get ridiculous explanations for unexpected outcomes of games.  For example, take the Giants-Patriots SB (Helmet Catch).  The Patriots were heavy favorites, and yet lost a close game.  Why?

– Is it because Eli Manning is just REALLY clutch?

– Is it because the Patriots “choked”?

– Is it because the Giants have more “heart”?  or “wanted it more”?

Of course not, those are all ridiculous explanations, and yet they’re a natural outgrowth of the way we normally think of games (option 1).

Now let’s look at the “Random Walk / Ex-Ante Relative Strength” model (the name needs work).  Here’s the picture again, just imagine a Patriots logo instead of the Broncos and a Giants logo instead of the Jaguars.

Suddenly there’s no explanation needed for the outcome of the game.  Just look at the picture; you can see there’s a section of the yellow shaded area below the X-Axis.  If we assume that at time T=60, all future game events will take a random path through the yellow area, then it’s obvious that SOME of those infinite paths will end up in the area below the X-Axis.  It just so happens that THIS PARTICULAR run was among those.

Now there’s also obviously some unpredictability in deriving a value for R.  It’s very difficult to know just how good each team is and how they match up against each other.  However, I’d argue that all of the necessary information for getting an accurate R value is theoretically knowable.  Compare that the Normal Model.  It requires us to predict future events, which is NOT POSSIBLE, even in theory.

The upshot of the “Random Walk” is that it forces people to confront a lack of “control”.  It basically boils the game down to a lottery.  That sucks some of the fun out of it, but that doesn’t mean its a less accurate model of analysis.

Similar to last time, I’m going to cut this off prematurely for the sake of time and clarity.  Hopefully you’re still with me.

# Not All Points Are Created Equal: Theoretical Support for Aggressive Strategy

Note:  This post is very long (1900 words) and involves some abstract strategic theory.  It is by no means a finished product, so I apologize if things aren’t very clear just yet.  Hopefully a few of you will read this and see where I’m going, in which case I’d love your help on explaining it better.  I have more to say about this, but I had to cut myself off somewhere.

Back in July, I wrote a post entitled “Not All Yards Are Created Equal“, which explained how team’s incentives and strategy should shift according to down and distance.  Today, I want to look at another area, with a thesis that will sound similar:

Not All Points Are Created Equal

Basically, points are not a static object; their “value” is not constant.  Of course, a TD is worth 6 points regardless of when you score it, but the VALUE of that TD changes.  The value of points, in essence, is a function of the relative strength of each team, the time remaining in the game, and the current conditions (Score/Field Position) of the game.

As those variables change, so to will the actual value of each point.  To make things easier, I’ve put those variables into an equation.  Note that this equation is not meant to be a “rule” or even be of any specific use.  It’s just to allow us to easily visualize what the relative consequences of variable changes will be to the overall result.

Expected Result = Relative Strength (1 – Time Elapsed / 60) + Current Position + Unknowns

OR

E = R ((60 – T) / 60) + C

Here, Expected Result is obviously the end result of the game.  Time Elapsed is similarly self-explanatory.  Current Position is a combination of the score and field position; here it may be helpful to think of AdvancedNFLStats.com’s Live Win Probability and each point during the game.  I’m going to ignore the Unknown factor because….well because its unknown.  We can’t quantify it; it’s just meant to serve as a reminder that a significant part of the outcome will be determined by chance.

Lastly, and most importantly (for my purposes today), is Relative Strength.  This factor accounts for the discrepancy in skill between the two teams.  Naturally, it’s difficult to quantify, which may be why NFL Coaches seems to be ignoring it in their in-game strategy, which brings me to my next major point:

NFL Coaches are ignoring a significant strategic factor in their in-game strategy, namely, Relative Strength.

Let’s look at Relative Strength at a high level, then drill a little deeper for practicality.  Using a timely example, this weeks Broncos vs. Jaguars game, we can easily see the importance of Relative Strength in in-game strategy.  For example, if the score at the end of the 1st quarter is Jax – 3, Denver – 0, who do you think will win?

Still Denver, right?  My guess is you’re also pretty confident about that.  So despite Jacksonville having a lead we still expect them to lose.  Why?  Because the Relative Strength is tilted so heavily in Denver’s favor that we expect them to outperform Jacksonville by a lot more than 3 points over the remaining 3 quarters.

Hopefully now you’re all with me.  Let’s go a little deeper, dipping our toes into Bayesian waters…

Relative Strength

The Relative Strength variable really consists of two components.  The first, and easiest to understand, is the ex-ante positioning of the teams.  For simplicity’s sake, we can use the Spread as a proxy.  There’s probably a better measure (Vegas isn’t trying to predict the outcomes), but, for you efficient market fans, it’s a pretty good representation of what we “know” about the relative strength of the participants before the game starts.

Going back to the Eagles/Broncos game, I believe the value was 11 points, in favor of Denver.  So, at that point, given all the information we knew about both the Broncos and Eagles, we (the market) expected that over 60 minutes of play, the Broncos would outperform the Eagles by 11 points.

Still with me?  Good, because now we get to the crux of the problem.

When the opening kick-off occurs, NFL Coaches seem to completely disregard that part of the R factor.  Instead, their conception of R is immediately replaced by the second component, New Information.  Essentially, NFL Coaches are overweighting the most recent data (what has happened in the game to that point) to the detriment of the other component of R, the ex-ante value.  This has very significant implications for in-game strategy, especially when the teams involved are of different skills levels.

To see why, let’s go back to our Jacksonville – Denver example.  The Spread for this week’s game, as of this writing, is 27 points (a record).  Using that as a proxy for R, we can write the original equation as follows, with a positive result (E) favoring Denver and a negative result favoring Jacksonville:

E = R ((60 – T) / 60) + C

E = 27 ((60-0) / 60) + 0

E = 27

Easy enough.  Now let’s look at our Jacksonville up 3 at the end of the 1Q scenario.

E = R ((60 – T) / 60) + C

E = 27 ((60-15) / 60) – 3

E = 27 ((45 / 60) – 3

E = 20.25 – 3

E = 17.25

Note that, for simplicity’s sake (again), I haven’t accounted for the second component of R, new information.  Doing so, in this situation, will lower R.  Our pre-game data pointed to an R of Denver +27, but we now have another quarter of play to account for.  Since Jacksonville won that quarter, the value of R has to drop.  HOWEVER, the point here is that, as a percentage of the overall sample, 1Q is pretty small, meaning the corresponding shift should be small as well, and definitely not large enough to account for the +17.25 value above.

So…Jacksonville is up 3-0 at the end of 1Q, but we still expect Denver to win by 17.25 points (a little less once we account for the New Information).  In your estimation, is that a “successful” quarter for Jacksonville?  Kind of.  They did significantly lower E (remember it has to go negative for JAX to win).  However, they’re still 17 points behind!

So, because R is so heavily tilted against them (Denver is much better), 3 points didn’t help that much…

Now we can start to see the foundation for my original assertion: Not all points are created equal.

Now let’s pretend that Jacksonville had a 4th and 3 at the 20 yard line when it kicked that FG (I haven’t run the EP scenario, pretend its equal, that is, kicking and going for it have the same expected value).  What should the team do?  GO FOR IT!

Over any amount of time, Denver is expected to significantly outplay Jacksonville.  That means that, up 3-0 at the end of 1Q, Jacksonville is still losing!  Let’s pretend for a minute that, after incorporating New Information, R is now equal to +16 (down from +17.25).  Should Jacksonville be confident, knowing they need to outplay Denver by 16 points over the rest of the game?  OF COURSE NOT!

When the Relative Strength of the participant teams is so uneven, the losing team must play AGGRESSIVELY, because at any time during the game, they should expect the other team to outplay them the rest of the way.  Therefore, to win, they need a large enough lead to account for the expected discrepancy.

Let’s visualize it.

That’s an illustration of what we’d expect from two evenly matched teams.  We can argue over the size of the shaded area, but I didn’t put too much though into it, so let’s not dwell on it.

Given that we’ve already incorporated relative strength (by setting the point at T= 60 to 27) and, theoretically, reflected all potential outcomes with our shaded area, we can project the progress of the game as a “random walk”, albeit one within the boundaries of the shaded area.

As the game progresses, the area will shift from left to right (time) and up/down (as R and C change).  Additionally, the width of the area will narrow, since less time remaining will progressively limit the range of outcomes.  So after the 1Q, it will look like this:

Notice that in this illustration, the odds of Jacksonville winning (shaded area below the x-axis) are still very small.  Given our ex-ante positioning, and what I believe is its proper inclusion in the in-game strategy, Jacksonville needs to do something significant if it hopes to have a reasonable chance of winning.  At this point, I need to step back and explain another aspect of the equation:

E = R ((60 – T) / 60) + C

Notice that as the game progresses, T converges to 0.  Logically, this makes complete sense.  With 30 seconds left in the game, the Relative Strength that we discussed above means almost nothing, there’s no time left for either team do much.  Conversely, C becomes more and more important, eventually becoming the only term (remember at the end of the game E = C).

So if we take our starting position, E = 27, and do nothing except run time off the clock, eventually that ex-ante advantage for Denver will disappear.  The point, though, is that we can’t forget it earlier in the game.

In current “conventional wisdom”, it’s almost as though once the game starts, R is forgotten; it shouldn’t be.

Over the course of the game, teams (especially bad ones) can only expect to have a couple of chances to significantly swing the odds (alter C).  To the degree that they are already behind (R), they should be more aggressive in effecting C, particularly because of one point:  You can, practically speaking, lose the game before the clock hits 0.  Using our illustration above, this would occur when the entire shaded area is above or below the x-axis.  So, let’s say the Broncos lead 21 – 0 at the end of the 1st Quarter.  It would look like this:

In the above illustration, using our equation, Jacksonville has already lost.  Basically, it will be nearly impossible for the Jaguars to outperform the Broncos by more than 21 points over the remaining 3 quarters, by virtue of what we know about their Relative Strength.

The takeaway is obvious.  The Jaguars can’t let it get to that point, hence kicking a FG instead of going for a TD in the red zone is a poor decision.  Again, we’ve accounted for relative strength in the positioning of the shaded area (range of outcomes), from there on in, the progression of the game should be thought of as random.  The Jaguars (and any significant underdog) need to take every opportunity they can to shift the range of outcomes.  They won’t get many chances, and in fact should never EXPECT to get another one.

So possession of the football in the red zone should be viewed as a singular and extremely valuable/important opportunity, once that should one that shouldn’t be wasted on a marginal gain of 3 points.

Going back to the beginning, the “value” of points changes according to the opponent.  3 Points against the Giants are worth far more than 3 points against the Broncos.  Coaches should adjust they’re strategy accordingly, and be much more aggressive when facing great teams.

The downside is that you don’t convert, and the range of outcomes shifts away from you. However, when there’s a significant mismatch, you were likely going to lose anyway.  By playing “conservatively”, i.e. taking the FG, you’re not only delaying the somewhat inevitable, but you’re passing on an important opportunity to make the game competitive.

Enough rambling…this needs a lot of refinement, but I had to start somewhere.  I know I still need to address assimilating New Information, so don’t think I’m ignoring it.  But I’ve lost 90% of the readers by now anyway, so I feel compelled to give the rest of you a temporary reprieve.

# Does Strength of Schedule tell us anything?

A couple of days ago, I tweeted this:

Obviously, I meant it to be a sign of hope.  The fact is, the Eagles not only have a relatively “easy” schedule remaining, but have also played an extremely “hard” schedule thus far.  As a result, the Win-Loss record may not be representative of the team’s actual ability.  Today, let’s dig into that a bit.

Here are the SOS ratings from Football Outsiders. We’ll start with the schedule so far.

The “hardest” schedule are on top, and we can see that the Eagles are #2.  So that means there’s nothing to worry about, right?  The Eagles have just had the unlucky misfortune of playing really good teams, and that’s why they’ve looked bad?  Look more closely, and you’ll see the problem with that conclusion (not that it necessarily invalidates it).

More specifically, look at the bottom of the table.  KC…DEN…WAS

By now, it should be obvious what I’m getting at.  This early in the season, it’s very hard to identify the true “cause” of each game result.  For example, the Denver Broncos look awesome, but they’ve also played one of the “easiest” schedules, with games against the Eagles (1 win), Giants (0 wins), Raiders (1 win), Ravens (2 wins).  How much of that is the competition being terrible and how much of it is the Broncos being great?

We don’t know for sure.

That’s a long way of saying that while I firmly believe the Eagles are better than their current record indicates (indeed FO has them as a “true” 1.8 win team right now), it’s also possibly that the Eagles opponents have looked good simply because the Eagles are so bad.  But is there any way to gain confidence in our assessment?  I think so.

We can look at “controllable” or “affirmative” stats.  Namely, these are areas in which the Eagles have more control than their opponents.  If these are bad, it’s an indication that the Eagles really aren’t good.  If they’re not, then it lends more credence to the idea that the Eagles have been victimized by a tough slate of games.

QB play – While the difficulty of the opposing defense obviously effects this area, it’s still largely a “controllable” function.  It may not seem like it, but Michael Vick has played fairly well.  His overall numbers:

55.1% completion (bad); 5 TDs/2 INTs (good); 1.7% INT rate (very good); 93.2 Rating (good); 10.6% Sack rate (bad); 228 Rushing yards (very good), 2 Rushing TDs (good).

Before you come at me with the outlier argument (Redskins game), let me say that, so far, the actual outlier game for Vick has been KC, where he was terrible.  His passer rating was 110+ against both the Redskins and the Chargers, and against the Broncos it was 83.6, which isn’t good, but isn’t terrible either, especially for a team with such a potent rushing attack.

Overall, Vick’s play doesn’t give us any reason to believe this is actually a “bad” team.

Sacks – Again, there is definitely an opposing team effect, but the idea is that it’s MORE dependent on the Eagles than it is on the opposition.  Highly correlated with winning, Sack Differential is among the most important “controllable” stats.  Naturally, you’re Sack Differential will be a lot better if you create a lot of sacks (especially with Vick at QB).

The Eagles are tied for 6th in the league with 14 sacks.

Five of them came against Kansas City, distorting the numbers a bit, but that’s still better than any other team has done against the Chiefs.  Moreover, as I showed earlier this week, the Eagles have consistently out-sacked the other opposition for each team.

I have to note that selling out your coverage to create sacks, as Davis has done a few times, is not ideal and potentially erodes the basic idea behind Sack Differential.  However, I don’t think that effect has been strong enough thus far to complete negate the value of the statistic.

Dropped Passes –  This one’s tough because of the inherent subjectivity of judging when a pass is “dropped”, but it’s also something that’s entirely controllable.  According to the Washington Post, the Eagles have 7 dropped passes thus far.  That’s not good, but it’s also not bad.  Overall, that places the Eagles in a three-way tie for 14th in the league.

However….

the operative statistic, as you probably guessed, is dropped pass RATE, not total.  After all, the Eagles do not throw the ball very often.  The team is currently attempting just 30.8 passes per game (TeamRankings), the 6th lowest rate in the league.

The Eagles have a Drop rate of 5.69%, which places them 9th in the league.  For reference, the Rams have the highest drop rate (8.2%) and the Vikings have the lowest (just 1.6%).  Overall the NFL average is 4.87%.

So the Eagles, in this controllable area, are not good, but again, they’re not terrible either.  When you factor in the subjectivity of grading “drops”, there’s very little here to be overly concerned about.

When you put the above statistics together, I believe it offers support for the “hard schedule” theory as opposed to the “Eagles suck” theory.  That’s not certain, but at this point, the preponderance of the evidence (law school reference!) points in an optimistic direction.

Future Schedule

Now the fun part.  I’m not going to cover this in a lot of detail.  I’ll just show you the numbers, because they’re self-explanatory.  Here are the Football Outsiders “Future Schedule” ratings.

All aboard…