Why all models are wrong

“All models are wrong”, my PhD adviser told me one day. We were having a conversation about why a given experiment of ours was off compared to the predictions of another group. He smiled at me, but I knew, realized, right away that he was correct in a certain sense.

This seems to be a point that isn’t appreciated.

Suppose I were to write down Newton’s Law of gravitation. That law is correct (up to a small deviation very near the Sun due to General Relativity). I could then build a model of the orbit of the Earth around the Sun, which predicts an elliptical path. This is a good model, much better than even the Copernican model, probably one you learned about in school. But it’s wrong.

Why is it wrong? Because it was treated as a two-body system, neglecting the effect of all the other planets, asteroids, comets, of the micrometeoritic material around the Earth, etc. One can then try to add those factors in, which is a Herculean task necessitating tons of observations and computing time. You improve the model. Then, an asteroid comes up, and someone asks you if there will be a collision. As an example, the meteor designated 2011 AG5 will cross our path sometime between 2040 and 2047; NASA lists the probability of impact at 0.2%.

You could fairly ask why NASA would have to list a mere probability of impact, and not tell us definitely what will happen. The answer is that it’s because their response is based on a model, that model has statistical noise in its variables, and it’s imperfect in conception. It’s wrong.

In weather forecasts, often a probability of rain is listed. Why? Why can’t they just tell you whether or not it will rain? What does it even mean that there’s a 30% chance of rain? It means that 30% of the time when the conditions looked like today, it ended up raining. Is the person wrong if it rains? No. He said there was a chance: 30%. If he said there was a 0% chance, he would be wrong. By saying there was a 30% chance he is being honest about the uncertainty in his statistical analysis and weather modeling capability. His model is wrong.

An actuary is a person who works at an insurance company. He makes sure that the premiums collected on life insurance are enough to cover the losses due to people with policies who die. How does he know how many are going to die? Can he predict who is going to die? In a certain sense, yes. He builds a model very much like the one that I’ve built for this blog. He collects a data set of 10,000 people, looking at various variables: age, weight, gender, smoker/nonsmoker, miles driven per day, for instance. Then he builds a model that predicts the probability of someone dying in the next year under all of those conditions. If many of the people stop smoking, this will reduce the number of deaths. The premiums for those people can then be lowered.

But now let’s take a small subset of people, say 3 of them. Bob is 55, still smokes (though he’s tried to cut back), is a bit overweight and drives 20 miles a day. John is 20, nonsmoker, thin, and drives 50 miles a day. Alice is 40, nonsmoker, in good shape, and rides the subway.

The actuary’s boss brings these people into a room and asks the actuary which will die first. The actuary uses the model he’s built from the 10,000 data points and predicts a probability of dying. Bob is the most likely, as he’s male, older, and smokes, so he’s a prime candidate for heart disease. Next is John, who is a young man, and young men die in traffic collisions far more than 40 year old women. Finally, Alice has the lowest probability of dying. If the boss says that one of them will die, the actuary would make a table:

Name Age Sex Smoker? Miles/day
driven
BMI Probability
of death
Bob 55 M Y 20 27 56%
John 20 M N 50 20 27%
Alice 40 F N 0 22 17%

Later that year, John wraps his car around a tree, dying. The actuary’s boss comes into his office, screaming mad that the actuary was wrong. “But”, says the actuary, “I was not wrong. I said that Bob was the most likely to die, not that he would definitely die”.

Was the model wrong? Yes. Why? Because it didn’t take into account every single thing about the entire world and how it affected these three people. That is impossible. Instead, what it did was reduce their lives to a few quantitative variables and project what the likelihood of death was, and said that Bob was the most likely to die. But the actuary was not wrong. He gave the honest probability as well as anybody could possibly have determined. If the man’s boss interpreted the table above as saying “Bob will definitely die”, then that is his problem.

Surely the reader can understand what I am getting at by now.

Contestant WNTS
Rating (avg.)
Dialidol Rank Bottom 3 Previously? Previous Rating Probability of
Elimination (%)
Jessica Sanchez 45.7 2 Yes 74.5 45.8
Joshua Ledet 54.7 2 Yes 69.0 31.7
Phillip Phillips 61.0 1 No 63.5 22.4

This was my forecast on last Thursday. Right there it says that Jessica Sanchez was the most likely to be eliminated. She was ranked 3rd on Dialidol. People eliminated in the Top 3 are usually ranked 3rd on Dialidol (though I bumped her to a tie because of how close the numbers were). She had the lowest WNTS approval rating of the night. The person eliminated in the Top 3 typically has a lower WNTS approval rating than the others. As such, the model predicted a 46% chance of her being eliminated. For Joshua, he was ranked second on Dialidol and had the second-lowest WNTS rating. Sometimes those people are eliminated, and the probability is reckoned to be about 32%. This leaves Phil with a 22.4% chance.

Jessica Sanchez was not eliminated, and Joshua was. The one with the highest probability was not eliminated. My question is: so what? A 32% chance is not small, not even remotely surprising. Twice in the past week here in Atlanta there was a 30% chance of rain, and it rained. So what?

The model that I use is wrong. It reduces all of the possible effects on voting into just two variables, which is not correct. It is, however, feasible, and the most intellectually sound one I can find. If it gets a fair amount during the entire season, then I’m happy. If it misses any one particular one, that is totally meaningless.

The model is a formula. It’s not racist. It’s not sexist. It’s not based on falsehoods. It’s a straightforward correlation and logistical regression. It’s dispassionate, reductive, and wrong. However, it is not totally wrong. Sometimes it hits, and sometimes it misses, and it’s been better than some experts out there predicting what will happen. That’s more than I expected.

I don’t have skin in this game. I don’t care for any of the contestants, really. The last Idol contestant that I really liked was Blake Lewis. I kind of liked Erika Van Pelt. What the model “thinks” is not always what I think, nor is it what I want. To be clear, I would prefer if Jessica won over Phillip, if for no other reason that I think women have been badly treated in the past 4 years. I’ve remarked many times in the Liveblog that Phil isn’t singing what I recognize to be notes.

It would actually be much easier for me to just sit back and not make predictions. The reason I built a model was to see if it was possible to do a prediction of who’s eliminated from just a few statistical indicators. That model makes a lot of calls, and a lot are wrong. But there are some that are right, and there are many that are in the ballpark. If you get some enjoyment or information out of reading them, that’s great. If they’re wrong, you should take that as a sign that 1. the data is noisy and sparse, 2. things with the voting change, and 3. there are many factors that are not quantifiable. Any comments beyond that are pointless, and the unlettered attacks flying around in comments sections of this blog tell me that these points haven’t been communicated well.

• James

After some obvious biases, someone is giving justication… Tsk tsk… Karma…

• Jessica

Any bias is in the data: either WNTS scores or Dialidol scores. Math isn’t racist and computers don’t hate women, so far as I know.

• V

I think you’ve communicated it just fine prior to this post. It’s analytics. It says so right in the name of the blog. You know there are plenty of people reading this that evidently can’t grasp that the numbers are pointing towards a certain outcome.

That said, analytical outcomes are funny when it comes to factoring in sex appeal, gender and age bias, and that certain something that someone has to make it as a performer. Can we quantify charisma? Especially as there are no audience factors to the analysis. Who is the audience (beyond location), how old are they, what music do they listen to…what music do they think is lacking in todays listening world? Who is the most sympathetic to the audience? The young girl who will surely get a contract even if she doesn’t win the dubious title of American Idol or the seemingly shy guy who goes his own way in a more creative way? Will the audience go for the underdog?

In a blog that seems to have both a dispassionate side (the analytics) and a passionate side (the gut opinion) those who are not analytical-minded can’t seem to see the passionate side of your opinions. Or they’ve not read all the posts to see there are opinions that are, oddly enough, slanted towards the more popular of the contestants. Guess one has to factor in numnuts when contemplating their comments.

• Mathematician

I do agree with James. If you read the previous posts there were biases and corrupted perceptions…

• Jessica

I imagine it’s pretty easy to agree with someone who has the exact same IP address as you. Nice try though!

• v

Bwhahahaha.

• Jason Flatley

Nice piece of expository writing. One often sees people misunderstanding probabilities in just the ways you call out.

You emphasize that all models are wrong, but is this really the most revealing way to put it?

It might be hard to see in an Idol data set, but in a large data set, as you know, the predictive power of the “least wrong” model can be significantly greater than that of a “not totally wrong” model. So a good model (that is not over-fitting) correctly captures important parts of the underlying structure of the phenomenon in question, even though by definition it is guaranteed to be wrong sometimes on the point estimates (the weather today, the loser of this round of Idol).

In other words, don’t the “least wrong” models deserve to be called “right” in some sense?

• Jessica

Our commenters would disagree, but any reasonable person could probably think so; you’re never going to be able to develop a model that will correctly predict every individual incident, but I could be persuaded that the model that does so a large proportion of the time is, in a sense, “right.”

• Reuben

I agree to a certain extent. Perhaps I overemphasized the point, but my view is that in any transition from a principle to a model requires some assumptions and is at least untrue around the margins and sometimes at its heart. I see your point, though.