## The semifinals model revisited

Last year I developed a model for predicting the outcome of the semifinals. You can read what the projection was, and there weren’t any wrong calls, or even any ranking errors. The Top 10 people as scored by the model were the actual finalists. This doesn’t mean that the model is “correct”, so to speak, but it does mean that it is already does a decent job of projecting what will happen. As the saying goes, all models are wrong, but some are useful.

## The Top 3 tracker

(Note: The first Top 3 tracker for season 12 should be up later today)

A new feature this year is the Top 3 tracker. Here I give the relative probabilities that the contestants will make the Top 3 given their standing at present.

The T3T works by using the projected IdolAnalytics week-to-week not-safe probability as an explanatory variable. The following chart shows the assigned not-safe probability for first finals voting round (called Round 2 by the model) for seasons 5-11. The points represent outcomes from previous seasons. The x-axis denotes the probability of being in the bottom 3 in that round assigned by the model. The y-axis represents whether or not they eventually made it to the Top 3, with 1 being “yes” and 0 being “no” (a small amount of vertical jitter has been applied so that overlapping points can be seen).

Round 2 not-safe probability vs chance of making the Top 3. This graph is interactive; hover your cursor over a point to see who the contestant was.

The blue curve is a regression line giving the estimated probability of making the Top 3 for any given assigned round 2 not-safe probability. For example, if this week the contestant had a 0.25 chance of being not-safe, his chance of making the Top 3 is only about 20%. But if he had a 0.1 chance of being in the bottom 3, his chances of making the Top 3 are about 50%. Hover your mouse cursor over the points to see which contestants they belong to.

You can see, roughly speaking, the “yes” (1) points are clustered to the left, and the “no” (0) points are more dense toward the right. Logically, a high chance of being in the bottom 3 in the Top 12 or Top 13 round means you are less likely to make it to the Top 3.

The model is still fairly uncertain during the Top 12. However, anybody with a not-safe probability of > 0.27 is unlikely to make the Top 3. It’s happened only twice in the time-frame we model (Syesha Mercado and Haley Reinhart). However, having a low not-safe probability at this point is not a guarantee of making the Top 3 (see, for instance Chris Daughtry and Siobahn Magnus).

If we look later in the contest, the model has to a certain extent converged:

Round 5 not-safe probability vs chance of making the Top 3. This graph is interactive; hover your cursor over a point to see who the contestant was.

In round 5 (normally the Top 8), the not-safe probabilities of people who did not make the Top 3 are significantly higher on average than those who did make the Top 3 (note that the probabilities used here are the average of round 4 and round 5). Nobody with a not-safe probability of less than 0.2 has ever not made the Top 3, and nobody with a score over about 0.43 ever has. Rarely has someone with > 0.3 made it to the Top 3. The region from 0.27 to 0.37 represents people who are “on the bubble”. They could go either way, and it depends on whether they can turn things around.

Because the data begins to get sparse in later rounds, an averaging mechanism has been employed. This reduces some of the week-to-week noise inherent in the model and returns a better overall fit to the observations. Rounds 5-7 use a two-week average, and 8-9 use a three-week average. This methodology is consistent with a belief that the contest largely hardens as the weeks go by, as people have chosen their favorite by then.

Why only a Top 3 tracker, rather than a straight up winner tracker? It turns out that it’s pretty hard to tell what’s going to happen in the Top 3 before it actually happens. Some races have turned on a dime as one contestant picks up a lot of new voters from the eliminated contestant. This kind of coalition building has probably happened in a few years, most notably with Kris Allen and Lee DeWyze. Thus, the fact that the tracker only attempts to ascertain who will be in the Top 3 represents the amount of uncertainty there still is.

As with the week-to-week forecast, this represents kind of conventional wisdom. Early rounds the model considered Pia Toscano to be a lock for Top 3, which is what many people thought, and it was wrong. It also thought Syesha Mercado was not going to make it, like most people did, and was wrong. However, it is right much more often than wrong, which is all one can hope for. The model isn’t designed, nor could I design it, to be capable of seeing difficult-to-foresee events. All it can do is make a reasonable inference based on previous years, and in that is seems to largely succeed. Once a precedent happens, the model never forgets that, but it also doesn’t go nuts when a weird result happens.

## Semifinal projection methodology

The logic that goes into the IdolAnalytics finals model is similar but not identical to that of the semi-final round(s). People say that Idol normally devolves into a popularity contest, but I tend to think that it begins as a popularity contest, then becomes somewhat more about singing, then lapses back into a popularity contest.

This popularity aspect has become true particularly lately, as we saw fairly concerted online efforts on behalf of people like Phil Phillips, hyping their contestant on Twitter. And, naturally, for that to matter the audience has to know who the person is in the first place. Thus, we would expect, and do observe, that pre-exposure (screen time allotted to a contestant before the first voting round) tends to matter. If your audition was shown, if you had a lot of screen time, that often meant you were more likely to advance to the finals. Continue reading

## A new model of American Idol

When it comes to actual predictions of American Idol outcomes week-to-week, I’ve always been interested but skeptical. Idol is a shifting and surprising thing. As a result, I did something of a reasonable but overly cautious nature, which is to compare a couple variables to outcomes of comparable shows. For instance, the Top 8 could be directly compared to all other top 8 rounds. Then, using a simple logistical regression, I let R dictate the coefficients of regression and left it at that. It was a toy model, not claimed to be worth much.

The old model ignored all of the personal observations I’ve made over the years about how the voting plays out, and was therefore not representative of the show as I see it. The data for the comparable rounds gets so sparse that it’s hard to draw much of a conclusion. And as a result, it wasn’t very good. But this year, I expended a lot of effort and a lot of time thinking about the problem in more detail.

What I recognized is that a fairly good statistical model could be built if I used some reasonable assumptions, sound observations, and careful methodology. The result of this I will outline in this post. This model is not necessarily predictive of the next year, but it is a quite good description of the years past, is not overfit, and hence stands a good chance of being an accurate description for the next. I’ve well characterized the confidence and ensured that predicted probabilities work out reasonably well to the actual observations.

Hit the jump for the details of the new IdolAnalytics forecast model. This is going to get decently technical, but that shouldn’t put you off. Continue reading