Hold Me To This: Assessing the Idol Analytics Model

It’s long past time that I stood back and assessed how well the model is doing at predicting the outcome of the Season 10 contest.

The Idol Analytics model has been run since the Top 11, where I cobbled together a regression analysis (with some serious issues). The next week I produced the model more or less as it is today. So, counting since the Top 11, here I’ll show how well the model did. For comparison purposes, I’ve also listed the person with the lowest number of votes predicted by several different sources:

Here are the results:

Round DJ Slim MJ Santilli Votefair Dialidol WNTS Idol
Lowest Vote
Top 11 0 0 0 1 0 0 Casey
Top 11 redux 1 1 2 0 1 1 Thia, Naima
Top 9 0 0 0 0 0 0 Pia
Top 8 1 1 0 0 1 1 Paul
Top 7 1 0 0 1 0 1 Stefano
Top 6 0 0 0 1 0 0 Casey
Top 5 1 1 1 0 1 1 Jacob
Top 4 0 0 0 0 0 0 James
Accuracy 0.444444 0.333333 0.333333 0.333333 0.333333 0.444444

There have been 9 lowest-vote-getters in the contest since the Top 11. This is, of course, one more than there would normally be, but Casey Abrams got the lowest number of votes in the first Top 11 round, and was saved, and that should be counted. A 1 means that the site guessed correctly, or a 2 in the case of the double-elimination week, and a 0 means it missed. Dialidol got Casey (both times) and Stefano. Votefair got Thia and Naima, but missed every week until Jacob. And so on.

The Idol Analytics model guessed 4 out of 9 correctly, which is as good as the other highest, DJ Slim. All the other services had 3 a piece. (The expected value from random guessing is 1.27, or about 0.14 accuracy.) Note that the model would have predicted Ashton and Karen correctly, had it been around for the Top 13 and Top 12 round. But so did almost everyone else.

So this site is a bit better than most experts, and better than the other measurement services, by about 10 percentage points. Ok. However, there is clearly still a lot of room for improvement.

For one thing, my model is slightly worse than the other prediction services at choosing the bottom 3/2. Its overall accuracy was about 53%, identical to Dialidol and worse than Votefair or WNTS. (The latter had an astounding 70% accuracy!) Were I to take the time, it would probably be worth it to check for correlations between a contestant being in the bottom 3 and after having already been there. This would have possibly picked up Haley’s frequent trips there. The reason this is a good idea is that it starts to correct for the fact that some contestants (particularly women) under-perform their quality. That is to say, they get lower votes than they should have.

Secondly, the model could try to take into account the effect of performance order on outcome. Creating a scoring model that follows the overall elimination trends of performance order would make sense.

Finally, it would probably be a very good idea to sort contestants into categories, such as “Rocker” and “Country Bumpkin” for the purposes of accounting for over-performance. This would no doubt have excluded Scotty McCreery from a couple projections, where his score clearly indicated he would be in the B3, but there was no chance of it happening.

Each of these elements is worthwhile, but the task of putting them in will be time consuming. Also, as one starts to “refine” a model, he runs the risk of over-fitting it, which I don’t want to do. One can see why this could happen by way of example: last night James sang in the pimp spot, but was eliminated. This means that to the extent that the model was right about that (he was projected as only slightly less likely than Scotty to go), this adjustment would make it worse.

Bookmark the permalink.
  • Hannah

    Ah Reuben….I LOVE this geeky stuff! I went with my gut and bet on Lauren to leave in a pool…now I wish I hadn’t! LOL! I look at a lot of Idol sites too…and had noticed that James seemed to be running more ‘middle of the pack’ (a la Pia) than most people seemed to think. But alas, my emotions overrode my science again proving that I am a human being for sure. I shall resist the urge to be emotional from this point forward….science trumps gut feelings…just about every time. 😉

    • Reuben

      I think gut feelings are a pretty good way to make decisions in this case, though. The people on the sites I listed went purely with gut feelings and got more than double the random-guess percentage. For a system with so few records, it takes a really deep knowledge of the contest to make a good inference, statistically or otherwise.