Since I’ve started building a forecast model for Idol, I’ve been terribly interested in Dialidol, a service that measures votes and busy signals during American Idol voting, and gives a raw score. Their main touted statistic is that they have never flubbed a prediction of the finale: since Season 4 (when they started), they have correctly predicted the winner of the contest every time.
However, if dialidol was a perfect forecast, there would never be a need for anything else, and it’s actually quite far from a perfect forecast. I can think of lots of reasons for this, but the main problem with dialidol is, of course, that it’s got sampling problems. It only measures a small number of people using the software to register votes by calling on a land line. Cell phone calls, texts, and now internet votes, are never counted. Moreover, the fewer people watch Idol (and its ratings have been dropping), the worse that sampling gets due to simply numbers.
The first, and by far the most important question, is how often does Dialidol accurately forecast the person going home:

Accuracy of Dialidol in predicting the bottom vote getter, by round. Also plotted in red is the accuracy of a random guess in the round.
Blue bars indicate the measured accuracy. Note that the accuracy is 0 for the Top 13 because there has only ever been one single-elimination Top 13 in Idol history, which was this year. Similarly, for whatever reason, Dialidol has never called the eliminated contestant from the Top 8 correctly. The red bars show what you get from a random guess, and so clearly Dialidol is forecasting much more accurately than a coin flip, and are as good as almost 60% in the Top 10-9 range. I have no guess as to why their accuracy suffers so much from the Top 8 to the Top 6. However, after that it increases again, and is still significantly better than a random guess.
Next, I wanted to see if there was any time dependence. Was the service getting worse?
The service had some major problems last year. There, they barely did better than random guess, shamefully bad, I must say (the worst was in the Top 10, where Didi Benami was ranked 5th on Dialidol, but was eliminated). The service definitely calls some right, but there is a huge amount of variability. If you went with them in Season 8, you’d have made some money. In season 9, you’d have lost all of it again. In fact, most of those problems came in the beginning:
(Note, no data available for season 4, since they started in the Top 6.) In season 9 they did worse than a monkey/squid picking would have done. I can’t help thinking that they realized they had major sampling errors and started to adjust their number crunching. This is perfectly ok, and it does tentatively suggest that they are back on the horse after last year’s debacle.
Now, how often was the person who was eliminated in Dialidol’s projected bottom group? That is, say they predict someone as being in the bottom 3, but not the worst, and that person is eliminated. How often does that happen?
Here the service does very respectably. If someone was eliminated in the Top 4, then Dialidol projected them to be in the bottom 2, every time. In the Top 6, the person eliminated was in the bottom 3/2 90% of the time (they change what they reveal every year. Some years the Top 6 gets the bottom 3 revealed, sometimes only the bottom 2. That’s why I can’t calculate the random probability). I would say that if Dialidol projects someone to be in the bottom 3, that person is in serious danger of being voted off, even if they are only projected as third worst by the service.
Finally, how accurate is Dialidol’s projected bottom 3? If we calculate the ratio of projected to actual bottom group members, we see the following:
The service actually does a very good job of getting 2 out of the bottom 3 for the Top 10 through the Top 5. The random guess probability is calculated from the expected value of guesses for N C 3 combinations or N C 2 combinations. I chose Top 6 to have a Bottom 2 only, which is why the accuracy jumps. This is quite a bit better than random guess.
I’m actually really impressed by how well Dialidol does overall. They don’t make a lot of memorable surprising calls, but they do on average quite a good job for a company with so few actual users. I’ll definitely incorporate Dialidol rankings into the model, probably weighting them linearly in time as the time goes by, so that their rankings are more meaningful in the later rounds.