Vote For The Worst picks are usually poor


Vote For the Worst is by now a tradition in American Idol. The website, founded during Season 3 of the show, attempts to promote a villain: a contestant who is widely hated, but makes it through many stages of the contest despite this. This is, of course, what a villain is—if Lex Luthor were an ineffective bumbling idiot who immediately fails in all his schemes, he wouldn’t be a villain, would he? No, a villain or foil has to be effective in addition to bad. “Worst” is a misnomer; VFTW is supposed to pick the contestant who is overperforming. That is, the contestant gets a disproportionate amount of the vote compared to what the quality of the performance should have garnered.

But does VFTW really do this effectively? Their selection process seems to be a blend of editorial control and forum posters preference. They have been known to switch contestants even when their pick wasn’t eliminated, presumably because of buyer’s remorse after seeing the next performance. The site usually picks around 6 “worst” contestants per season, and the reason for switching is often that their pick was eliminated. Some seasons, though, have seen as many as 9 picks. These statistics of themselves don’t imply anything except that perhaps the history is a bit checkered, and that maybe the site on balance does on ok job.

This, though, is not the case. I would say in general, VFTW chooses badly much of the time. They are unable to detect a contestant who is going to overperform in the voting. This is partly because they tend to choose the actual worst (which America also detects and eliminates the contestant through voting) or because of some particularly bone-headed choices that were actually middling singers who endured for many rounds (I’ll come to this later).

Assessment methodology

Generally speaking, the American Idol contest isn’t so different from many statistical phenomena. Although the dynamics that come into play in any given round may be different, the contest follows more or less the same kind of statistical trends that things like heart attacks have in the population. Think of “having a heart attack” as the dependent variable. Some things that affect whether or not having a heart attack happens are age, sex, cholesterol level, blood pressure, etc.

This analogy can be made complete if we think of “having a heart attack” as “being eliminated from American Idol”. Instead of age, sex, cholesterol, we have sex, performance quality, past performance quality, Dialidol rating, age, song type, etc. We then form a model to predict the probability of whether or not someone will be eliminated. The most natural model to apply is the logistical model, whereby variables are fit to a linear function, and these then are plugged into a function that looks like the odds ratio. You can then roughly predict the probability of someone’s being eliminated by putting in the aspects of that week’s performance. At some point in the competition, these variables may stop being indicative, and so the seasons have to be split into weeks, so that the Top 10 is compared only to the Top 10 of every other season, but not to the Top 5. The Top 5 should only be compared to the other Top 5 rounds. (The main reasons for this are that gender gradually becomes less important as the contest goes on and that Dialidol stinks at the beginning.)

Inherent in this is that you are comparing this week with all the other seasons. Meaningful predictions of Idol are really only now becoming feasible, after 9 complete seasons. There is still a huge amount of noise in the data (e.g. Pia Toscano’s elimination), but in the 10th season it becomes possible to take into account perhaps 2 independent variables and calculate the elimination probability in any given week. This is how Idol Analytics model works, and it seems to do a reasonable job. You would not be able to do any kind of analytics of a show like The Voice now airing on NBC, because, well, there’s no data to base a prediction on. This, to me, is what makes American Idol interesting now.

When it comes to assessing VFTW, we have a problem. We do know how good a given performance was, or at least we have a pretty fair idea if we use the WhatNotToSing approval rating. But what we do not know is how many votes the contestant got for the song. Think of trying to show how accurate a given political pollster was without knowing the vote totals of previous elections, and you start to get the idea. All we know is whether the contestant was the lowest vote-getter or was in the Bottom 3 of vote getters.

Using this information, then, we can get the idea of how at least 3 and at most 5 people per round did in the voting. We won’t know exactly, but we’ll have a good enough idea of whether or not the contestant is under-performing or over-performing. Now all we need is how well the person should have done, which is provided by the Idol Analytics model.

Here is an example from the Top 9 of Season 3:

Contestant Sex WNTS
Result Prior
Prob. Over/Under
Camile Velasco F 6 Eliminated 20 0.451871 0
John Stevens M 9 Safe 28 0.414793 0.280569
Jon Peter Lewis M 23 Safe 33.33 0.278501 0.144277
Diana DeGarmo F 42 Bottom Group 69.33 0.134224 -0.14428
Jasmine Trias F 47 Bottom Group 58.33 0.11487 -0.16363
Fantasia Barrino F 64 Safe 73.66 0.056957
LaToya London F 82 Safe 78.33 0.027079
George Huff M 87 Safe 77 0.022123
Jennifer Hudson F 90 Safe 47.33 0.021346

In this week, the Idol Analytics model assigned a 45% probability of Camile Velasco being eliminated, and she was. Therefore, we would say she performed no better or worse than expected, a zero. John Stevens was predicted to be in the Bottom 3, with an elimination probability of 41%. But he wasn’t, he was Safe. Therefore, we would say John Stevens over-performed by at least 28 percentage points. This is because his outcome (Safe) should have been below the fourth highest probability (13%), but was actually above it. By contrast Diana DeGarmo was projected to be Safe, but was actually in the Bottom 3. Therefore, she was under-performing by at least 14 percentage points, represented by a negative number. For the last 4, though, we just don’t know. Maybe Jennifer Hudson got the most votes, and maybe she didn’t.

John Stevens, therefore would seem to be a good pick. He did much better than he “should have”. And indeed, Stevens was the VFTW pick that year.

The analysis was run like this for all season that had VFTW picks (Seasons 3-10). I went from the Top 10 round to the Top 4 round. Prior averages were computed from the prior 3 rounds, or 2 rounds if that was all that was available. Logistical regressions were performed for approval rating and prior approval only, so gender effects are ignored. Seasons 1 and 2 are included in the regression analysis, but the probabilities are not computed since there was no VFTW pick to assess.

VFTW picks are then separated into three categories according to the Under/Over index: positive numbers (over-performing contestants), negative numbers (under-performing contestants) and 0 (performed where they should have).


If VFTW is doing a good job at picking, then a large proportion of their picks should be in the over-performing category, and very few in the under-performing. The results are tabulated below. The first column is the number of picks that were good villains, scoring better than they ought to have. The second column is picks that performed “at spec”, or where they should have. The third column is picks that were actually were singing better than the results would indicate, and were getting short shrift from the voting public.

Round Over-perform At spec Under-perform
Top 10 3 1 0
Top 9 3 2 0
Top 8 2 2 1
Top 7 3 1 2
Top 6 2 2 2
Top 5 1 4 0
Top 4 2 2 1
Total 16 14 6

In the sample, VFTW had 16 genuine villain picks, including John Stevens, Megan Joy, Sanjaya Malakar, Scott Savol, Brooke White, and Jasmine Trias. They get one point every time one of their picks scores low but gets proportionately more votes.

However, just about as often, the picks perform right where they should have. Lakisha Jones, Jason Castro, Scott Macintyre, and Paul MacDonald fall into this category, as well as Kellie Pickler. However, if I had included all ones predicted to be safe that were, this number would explode to more than double its value. The vast majority of picks are people who should have been safe and were safe. This shouldn’t happen nearly so much.

Even worse is VFTW picks that are actually getting fewer votes than they should have. Siobahn Magnus was one such pick, as was Casey Abrams.

Then there are the particularly egregious picks. Kristy Lee Cook was the pick for Season 7, and she basically never over-performed. By the end of her tenure, she was actually running behind of where she should have been, being eliminated when she should have been safe. Or how about Constantine Maroulis, one of the most under-performing in history, a whopping 40 percentage points in the Top 6 of Season 4. Why was Constantine ever a pick? He was a consistent high performer.

Other ignoble picks int he VFTW portfolio? Try Jennifer Hudson; Phil Stacey, who got royally screwed; Michael Lynche who had difficulty connecting to voters.

They also missed some of the biggest over-performers of all time. Ramielle Mulaby, who lasted much longer than she should have. Ace Young, the good looking but bad sounding “rocker” from Season 5. Danny Gokey was picked extremely late, after he had been over-performing for weeks.

Final thoughts

Vote For the Worst has picked up on the big villains in Idol history: Scott Savol, Tim Urban, Sanjaya, Jasmine Trias, they’re all there. But why not pick Scotty McCreery? He is clearly running way back in the pack in terms of quality and still sailing through. Instead, they pick Casey Abrams. There was no evidence at all that Casey was over-performing: the guy was voted out in the third round on a very decent song! So they picked him, and unsurprisingly he was gone in 1 week.

This isn’t the way to do it. VFTW needs to start thinking demographically. If they don’t pick a white, male, country singer when he comes along, they’re doing something wrong. They’re overlooking the most obvious traits that would make a crappy singer stay in the competition. The fact that they ever pick women near the beginning should be a sign that they’re a bit daft—doesn’t everyone know that women get screwed in the early stages?

Bookmark the permalink.
  • Jessica

    Ah, but maybe they’re trying to create, rather than identify, a contestant who vastly overperforms. I think they hoped to have a legion of voters who could upset the choices of the sincere voters. Obviously that’s never going to happen, though.

  • Reuben

    If you’re a Worster who doesn’t like this piece, you should comment here, and not make ad hominem attacks in the other parts of the blog. That makes you seem like a cretin.

    If you don’t like the conclusions, you might think about making a substantive claim as to what you don’t like about the methodology. It’s all spelled out.

    By the way: I like VFTW. The idea is great. But it would take some convincing on the part of Worsters to get me to believe that Jacob Lusk was a good pick. What reasonable person thought he was going to make it significantly further in the contest? Scotty McCreery was the obvious choice from day 1. The guy gets made fun of basically every week on the Soup, he gets low approval from the blogs, and he’s never been in the Bottom 3/2.