(Note: The first Top 3 tracker for season 12 should be up later today)
A new feature this year is the Top 3 tracker. Here I give the relative probabilities that the contestants will make the Top 3 given their standing at present.
The T3T works by using the projected IdolAnalytics week-to-week not-safe probability as an explanatory variable. The following chart shows the assigned not-safe probability for first finals voting round (called Round 2 by the model) for seasons 5-11. The points represent outcomes from previous seasons. The x-axis denotes the probability of being in the bottom 3 in that round assigned by the model. The y-axis represents whether or not they eventually made it to the Top 3, with 1 being “yes” and 0 being “no” (a small amount of vertical jitter has been applied so that overlapping points can be seen).
The blue curve is a regression line giving the estimated probability of making the Top 3 for any given assigned round 2 not-safe probability. For example, if this week the contestant had a 0.25 chance of being not-safe, his chance of making the Top 3 is only about 20%. But if he had a 0.1 chance of being in the bottom 3, his chances of making the Top 3 are about 50%. Hover your mouse cursor over the points to see which contestants they belong to.
You can see, roughly speaking, the “yes” (1) points are clustered to the left, and the “no” (0) points are more dense toward the right. Logically, a high chance of being in the bottom 3 in the Top 12 or Top 13 round means you are less likely to make it to the Top 3.
The model is still fairly uncertain during the Top 12. However, anybody with a not-safe probability of > 0.27 is unlikely to make the Top 3. It’s happened only twice in the time-frame we model (Syesha Mercado and Haley Reinhart). However, having a low not-safe probability at this point is not a guarantee of making the Top 3 (see, for instance Chris Daughtry and Siobahn Magnus).
If we look later in the contest, the model has to a certain extent converged:
In round 5 (normally the Top 8), the not-safe probabilities of people who did not make the Top 3 are significantly higher on average than those who did make the Top 3 (note that the probabilities used here are the average of round 4 and round 5). Nobody with a not-safe probability of less than 0.2 has ever not made the Top 3, and nobody with a score over about 0.43 ever has. Rarely has someone with > 0.3 made it to the Top 3. The region from 0.27 to 0.37 represents people who are “on the bubble”. They could go either way, and it depends on whether they can turn things around.
Because the data begins to get sparse in later rounds, an averaging mechanism has been employed. This reduces some of the week-to-week noise inherent in the model and returns a better overall fit to the observations. Rounds 5-7 use a two-week average, and 8-9 use a three-week average. This methodology is consistent with a belief that the contest largely hardens as the weeks go by, as people have chosen their favorite by then.
Why only a Top 3 tracker, rather than a straight up winner tracker? It turns out that it’s pretty hard to tell what’s going to happen in the Top 3 before it actually happens. Some races have turned on a dime as one contestant picks up a lot of new voters from the eliminated contestant. This kind of coalition building has probably happened in a few years, most notably with Kris Allen and Lee DeWyze. Thus, the fact that the tracker only attempts to ascertain who will be in the Top 3 represents the amount of uncertainty there still is.
As with the week-to-week forecast, this represents kind of conventional wisdom. Early rounds the model considered Pia Toscano to be a lock for Top 3, which is what many people thought, and it was wrong. It also thought Syesha Mercado was not going to make it, like most people did, and was wrong. However, it is right much more often than wrong, which is all one can hope for. The model isn’t designed, nor could I design it, to be capable of seeing difficult-to-foresee events. All it can do is make a reasonable inference based on previous years, and in that is seems to largely succeed. Once a precedent happens, the model never forgets that, but it also doesn’t go nuts when a weird result happens.