Top 13 result was the 9th most surprising ever


Given the abysmal performance of the model in ranking the contestants in the Top 13, and given that M.K. being included was not foreseen at all (at least in my opinion), it’s worth asking how surprising this week’s result was compared to all other Idol episodes. There have been 89 episodes in the finals over 13 seasons, and of those this week’s was the 9th most surprising episode, putting it in the top 10% of episodes in surprising…ness.

This is how I reckon it. We can go back through all previous episodes and assign a probability of being not-safe in exactly the same way the model currently works. Then, we calculate the total probability of all the people who were actually safe and the total probability of all the people who were not-safe (bottom 3, eliminated, or saved). The ratio of these two is indicative of the expectations of the model, a “surprise index” or sorts.

The most surprising episode of all time is Season 6’s Top 3. Melinda Doolittle looked like a lock for the finals, but Blake Lewis was instead voted through, a truly shocking result. The model assigned Melinda only a 6% chance of elimination, versus 22% for Jordin Sparks and a humongous 72% for Blake. She lead on all indices: WNTS, Votefair, and Dialidol. Nobody called that.

Season 10 makes up 3 of the top 10 most surprising episodes. There was James Durbin’s elimination in the Top 4 of season 10, Jacob Lusk hanging on in the Top 6, and, of course, the stunning bottom group of the Top 11 that same season, in which Casey Abrams had to be saved by the judges (the model considered Naima Adedapo a goner for sure).

This past Thursday was slightly less surprising than all of those, but only slightly. This isn’t so much because Malaya or Kristin were in the bottom 3, but rather that C.J., Ben, and Dexter weren’t, and M.K. was. These didn’t seem very likely based on the numbers, but I don’t think they were out of line with my gut feeling.

I’m currently withholding judgement about Dialidol as an indicator. Signs point to its possible irrelevance, given the move toward internet voting. Moreover, its huge volatility was the only reason Kristin wasn’t declared the most likely to go home by the model. If another week like this happens, it will have to be pulled from the model consideration.

Bookmark the permalink.
  • Chris Biehn

    I enjoy your blog and am just getting caught up over the past couple of weeks. You may want to consider replacing dialidol with the results of one or more of the many polls that go up after each performance night. Mjsbigblog, Michael Slezak and come to mind. I don’t know if that would have altered the prediction for top 13, but for what it’s worth, the poll had MK’s performance rated toward the bottom.

    Also, at the risk of making this more complicated than you want to, you may also want to consider applying some kind of “gender factor”, especially for the earlier rounds where females always seem to be more at risk.

    • Reuben

      Both are fine ideas, and I may do so. Two points

      1. I can, of course, include anything I want in it. But the risk one runs is that if you don’t know how historically accurate the poll is, you don’t really know how to calibrate the poll in your findings. When a poll is scientific, this isn’t an issue. But when a poll is unscientific, the poll is getting information on some subset of people, and that subset may be quite bad at predicting. So I need the poll, and to know how the poll has done in the past. MJs polls will take some time to go through and tabulate.

      2. I think I said this way back when, but even if I did it’s worth repeating. I investigated incorporating gender into the ratings, and it’s just too unreliable. As often as it corrects a prediction, it makes something else wrong somewhere else. I believe, personally, that gender is partially incorporated into WNTS ratings and so forth. I was unsuccessful at improving the model’s accuracy with gender.