Category Archive: Editorial

May 17 2013

Closing thoughts

As I said at the beginning of the year, my sole aim this season was to build a predictive model worth a damn, just to see if it could be done. In the end, the model was 93% accurate in calling people safe (including winner Candice Glover tonight) and picked the person eliminated 78% of the time (89% of those eliminated were ranked either first or second most likely to go, FWIW). That’s pretty decent.

The site found an audience, as well, which I suppose I’m happy about. The site was the top Google search for “Idol predictions” as of yesterday, and about 6000 unique visitors showed up to see the (thankfully correct) call of Candice as winner. About 3000 people found their way to the site weekly, on average, during the finals.

S12Sitestats

I would note that if you wrote a message to me in the comments, I likely did not see it, since reading comments on the interwebz makes me at least 25% less happy (often more). Not being rude. Jessica does read them, but if you want to drop me a line, my Twitter is the best way.

It’s worth asking, was this a good season of American Idol? The best singer did win, and the Top 3 were good singers. This was most definitely not the case in the past 2 years. There were a few moments that stand out in my mind: Janelle doing “I Will”, Candice in several of her outings (“Straight Up”, “Don’t Make Me Over”, and “I (Who Have Nothing) last night) all gave me chills. I liked Nikki Minaj as a judge a lot, and am very disappointed that she will not be staying on. Angie started pretty strong for me, though she clearly began to drop off. So, yes, it was a pretty good year.

There’s been a ton of digital ink spilled about what Idol needs to do to revive its ratings, and what I’ve read of it didn’t seem right at all. The bottom line is that that the show existed for many years as the only real game in town. Sure, there were little shows like Nashville Star, but nothing substantial. But now there’s The Voice. There’s X Factor. Why wouldn’t we think the ratings would go down? So, anyway, if you want my dumb opinion, to go along with the rest of the internet’s dumb opinions, it’s this: if you want ratings back, stop cannibalizing yourself, cancel X Factor, and get Simon Cowell back on the show. Anything short of that, such as firing Nigel, changing the themes around, reducing the auditions rounds, is rearranging deck chairs on the Titanic.

Since I’m not invested in (and, in reality, have never seen) any of the other shows that sites like this cover after Idol ends, I do not maintain this site between seasons, so if you are a regular reader, sorry. I do maintain my personal blog IdleAnalytics, all year, and I tweet a fair amount, should you want to follow such things. Jessica can be found on Twitter as well as her Tumblr.

Thanks for reading.

May 14 2013

A brief introduction to statistical modeling

Here I talk a lot about “the model” as if it’s a living, breathing thing. But there isn’t anything mysterious about a model, it’s a totally understandable thing based on math, data, and a bunch of assumptions.

When you think about a “model” used in regular life, it’s something like a scale model of a building. Why is such a thing useful? After all, the model of a building isn’t functional in any of the same ways as the building itself. Nobody can live in it, go inside it. The model is useful, though, to a group of architects trying to lay out a neighborhood, or to a plumber deciding where to run his pipes. That is, a model is purposeful: it only has to be like the real system in a way that you get the information you need.

When you build a numerical model, you’re building a description of something. Consider the first thing most physics students learn: the path a ball takes when you toss it into the air.

Read the rest of this entry »

Apr 04 2013

The difficult decisions coming up

This started out as a response to a comment on the Liveblog, but I seem to be unable to keep it concise.

And, well, I think that’s setting up a teeny bit of a false dichotomy, at least at this point in the competition. I love both Candice and Angie; they’re both super. Which one I like better varies nightly, and choosing between them in the final (what? no?) is going to be really hard. I think they’re equally talented singers, just with slightly different styles.

If I had to bet on who would be more successful commercially, I might be tempted to say Candice, but then again it’ll come down to what she ultimately records: hopefully she’ll rock it out like she did last night and on Come Together. Kelly Clarkson-style rock would really work for her, but if she records something like Tamyra Gray’s utterly forgettable album (I bet you’ve already forgotten it) then, well, she’ll do equally poorly. Luckily she’s a much stronger singer than Tamyra, and I imagine there are way more people looking to write for her voice than for Tamyra’s perfectly pleasant but far less exciting voice.

Angie, on the other hand, might end up another Idol-on-Broadway. She certainly can do that sort of thing, but I think there might be a place for her in the pop pantheon; unfortunately it would most likely be making music that I don’t particularly like, e.g. emo-ish stuff like Evanescence or what Colton Dixon sings. Then again, she could be the next Sara Bareilles; when she did her original song, it was much more reminiscent of that sort of thing, which I do like (albeit somewhat ashamedly).

And then there’s Kree… I do love me some Kree, even though I don’t really love me some country. But Kree is more versatile than many country people (I’m looking at you, Janelle), and she has a Liv Tyler-esque prettiness that I really appreciate. A top 3 of Angie, Candice, and Kree seems about right, but I’m just not sure which two will come out of it and which will be sent home. It may very well be that once Kree absorbs Janelle’s votes—I’m thinking Kree will outlast the less-talented Janelle easily—she may be able to surpass either Candice or Angie. For me it will probably come down to how I liked their performances on a given night, since cumulatively I can’t choose between the trio.

We got really lucky with the girls this year, huh?

Mar 04 2013

Effects of Supervote rule change are murky

vote-image

Today it was announced that Idol would get a new voting feature, called the “supervote”. As I understand it, rather than voting 50 separate times on Facebook or American Idol’s website, you can cast 50 votes all at once, dividing them among your favorites in any way you want.

I think it’s pretty clear that this is going to weight the voting more in the direction of online voting. Rather than having to sit and press vote and fill in a CAPTCHA each time, the work to cast 50 votes (which was the maximum allowed last year online) has just been made tiny by comparison. People who were likely only to cast 10 votes will almost certainly cast the entire 50 now. This makes phone or text voting much less appealing.

However, I’m not sure that this makes much of a difference. In an age where Twitter hashtags appear on television screens ubiquitously and 150 million Americans use Facebook, surely a lot of voting was going on by web. With a sample size that big, it’s hard to imagine that such a rule change would significantly change the demographics by much, and hence affect the voting trends.

That being said, I am concerned that it could drastically throw off the accuracy of Dialidol. Dialidol’s service works by measuring how many busy signals are on the line when voting, and using an empirical method to predict the result. Even if the number of phone votes stays the same, the number of online votes is apt to rise by a factor of between 5 and 10, since who would vote 10 times if one could vote 50 times just as or more easily? Dialidol appeared to weather the shift of many voters to text messages, maintaining fairly good accuracy, but 50 votes with a couple clicks is a huge number.

From an electoral perspective, it’s interesting to find out what effect the supervote policy has on vote splitting. It’s well known that with three candidates, two of which are similar, votes will be split between the two similar candidates leading to election of the more heterogeneous candidate. The most obvious example was the election of Bill Clinton, who benefited greatly by Ross Perot taking away part of George H. W. Bush’s votes. That was an election with 1 voter, 1 vote. In American Idol, depending on how onerous the requirements are for voting, there may be a de facto barrier similar to the legal barrier that American elections have. That is, if it’s possible, but annoying, to cast many votes, then some people won’t cast them, and vote splitting is important. However, with a low barrier to multiple voting, vote splitting could become less important.

Consider the Top 3 of Season 6. Jordin Sparks, Blake Lewis, and Melinda Doolittle were vying for the two spots in the finale. Jordin and Melinda were fairly similar, in that they were black women who sang torch songs or R&B. Blake, meanwhile, was an offbeat singer, prone to electronica. When it was just between Jordin and Blake, Jordin won. But in a three way race, votes from people who liked black soul female singers may have split votes between Jordin and Melinda, making a Jordin and Melinda finale much less likely.

On the flip side, if everyone votes the full 50 times for their favorite person, this rule may just lead to vote inflation, with no overall changes to the general milieu. This is an empirical question, not one that someone can divine the answer to.

May 20 2012

Why all models are wrong

“All models are wrong”, my PhD adviser told me one day. We were having a conversation about why a given experiment of ours was off compared to the predictions of another group. He smiled at me, but I knew, realized, right away that he was correct in a certain sense.

This seems to be a point that isn’t appreciated.

Suppose I were to write down Newton’s Law of gravitation. That law is correct (up to a small deviation very near the Sun due to General Relativity). I could then build a model of the orbit of the Earth around the Sun, which predicts an elliptical path. This is a good model, much better than even the Copernican model, probably one you learned about in school. But it’s wrong.

Why is it wrong? Because it was treated as a two-body system, neglecting the effect of all the other planets, asteroids, comets, of the micrometeoritic material around the Earth, etc. One can then try to add those factors in, which is a Herculean task necessitating tons of observations and computing time. You improve the model. Then, an asteroid comes up, and someone asks you if there will be a collision. As an example, the meteor designated 2011 AG5 will cross our path sometime between 2040 and 2047; NASA lists the probability of impact at 0.2%.

You could fairly ask why NASA would have to list a mere probability of impact, and not tell us definitely what will happen. The answer is that it’s because their response is based on a model, that model has statistical noise in its variables, and it’s imperfect in conception. It’s wrong.

In weather forecasts, often a probability of rain is listed. Why? Why can’t they just tell you whether or not it will rain? What does it even mean that there’s a 30% chance of rain? It means that 30% of the time when the conditions looked like today, it ended up raining. Is the person wrong if it rains? No. He said there was a chance: 30%. If he said there was a 0% chance, he would be wrong. By saying there was a 30% chance he is being honest about the uncertainty in his statistical analysis and weather modeling capability. His model is wrong.

An actuary is a person who works at an insurance company. He makes sure that the premiums collected on life insurance are enough to cover the losses due to people with policies who die. How does he know how many are going to die? Can he predict who is going to die? In a certain sense, yes. He builds a model very much like the one that I’ve built for this blog. He collects a data set of 10,000 people, looking at various variables: age, weight, gender, smoker/nonsmoker, miles driven per day, for instance. Then he builds a model that predicts the probability of someone dying in the next year under all of those conditions. If many of the people stop smoking, this will reduce the number of deaths. The premiums for those people can then be lowered.

But now let’s take a small subset of people, say 3 of them. Bob is 55, still smokes (though he’s tried to cut back), is a bit overweight and drives 20 miles a day. John is 20, nonsmoker, thin, and drives 50 miles a day. Alice is 40, nonsmoker, in good shape, and rides the subway.

The actuary’s boss brings these people into a room and asks the actuary which will die first. The actuary uses the model he’s built from the 10,000 data points and predicts a probability of dying. Bob is the most likely, as he’s male, older, and smokes, so he’s a prime candidate for heart disease. Next is John, who is a young man, and young men die in traffic collisions far more than 40 year old women. Finally, Alice has the lowest probability of dying. If the boss says that one of them will die, the actuary would make a table:

Name Age Sex Smoker? Miles/day
driven
BMI Probability
of death
Bob 55 M Y 20 27 56%
John 20 M N 50 20 27%
Alice 40 F N 0 22 17%

Later that year, John wraps his car around a tree, dying. The actuary’s boss comes into his office, screaming mad that the actuary was wrong. “But”, says the actuary, “I was not wrong. I said that Bob was the most likely to die, not that he would definitely die”.

Was the model wrong? Yes. Why? Because it didn’t take into account every single thing about the entire world and how it affected these three people. That is impossible. Instead, what it did was reduce their lives to a few quantitative variables and project what the likelihood of death was, and said that Bob was the most likely to die. But the actuary was not wrong. He gave the honest probability as well as anybody could possibly have determined. If the man’s boss interpreted the table above as saying “Bob will definitely die”, then that is his problem.

Surely the reader can understand what I am getting at by now.

Contestant WNTS
Rating (avg.)
Dialidol Rank Bottom 3 Previously? Previous Rating Probability of
Elimination (%)
Jessica Sanchez 45.7 2 Yes 74.5 45.8
Joshua Ledet 54.7 2 Yes 69.0 31.7
Phillip Phillips 61.0 1 No 63.5 22.4

This was my forecast on last Thursday. Right there it says that Jessica Sanchez was the most likely to be eliminated. She was ranked 3rd on Dialidol. People eliminated in the Top 3 are usually ranked 3rd on Dialidol (though I bumped her to a tie because of how close the numbers were). She had the lowest WNTS approval rating of the night. The person eliminated in the Top 3 typically has a lower WNTS approval rating than the others. As such, the model predicted a 46% chance of her being eliminated. For Joshua, he was ranked second on Dialidol and had the second-lowest WNTS rating. Sometimes those people are eliminated, and the probability is reckoned to be about 32%. This leaves Phil with a 22.4% chance.

Jessica Sanchez was not eliminated, and Joshua was. The one with the highest probability was not eliminated. My question is: so what? A 32% chance is not small, not even remotely surprising. Twice in the past week here in Atlanta there was a 30% chance of rain, and it rained. So what?

The model that I use is wrong. It reduces all of the possible effects on voting into just two variables, which is not correct. It is, however, feasible, and the most intellectually sound one I can find. If it gets a fair amount during the entire season, then I’m happy. If it misses any one particular one, that is totally meaningless.

The model is a formula. It’s not racist. It’s not sexist. It’s not based on falsehoods. It’s a straightforward correlation and logistical regression. It’s dispassionate, reductive, and wrong. However, it is not totally wrong. Sometimes it hits, and sometimes it misses, and it’s been better than some experts out there predicting what will happen. That’s more than I expected.

I don’t have skin in this game. I don’t care for any of the contestants, really. The last Idol contestant that I really liked was Blake Lewis. I kind of liked Erika Van Pelt. What the model “thinks” is not always what I think, nor is it what I want. To be clear, I would prefer if Jessica won over Phillip, if for no other reason that I think women have been badly treated in the past 4 years. I’ve remarked many times in the Liveblog that Phil isn’t singing what I recognize to be notes.

It would actually be much easier for me to just sit back and not make predictions. The reason I built a model was to see if it was possible to do a prediction of who’s eliminated from just a few statistical indicators. That model makes a lot of calls, and a lot are wrong. But there are some that are right, and there are many that are in the ballpark. If you get some enjoyment or information out of reading them, that’s great. If they’re wrong, you should take that as a sign that 1. the data is noisy and sparse, 2. things with the voting change, and 3. there are many factors that are not quantifiable. Any comments beyond that are pointless, and the unlettered attacks flying around in comments sections of this blog tell me that these points haven’t been communicated well.

May 11 2012

America loves White Guys With Guitars

It was apparently news today that White Guy With Guitar (WGWG) winner #2, Kris Allen of season 8, thinks that another WGWG will win this year. He’s referring to Phil Phillips, who would indeed by the fifth such winner in 5 years. I happen to agree with Kris Allen that Phil will probably win. That is not an endorsement, just a thing I think is true.

You know what else is true? It isn’t just American Idol voters: Americans in general like WGWGs.

Take any metric you like. Above is the demographic breakdown of the Rolling Stone Top 100 Artists. About half of them are WGWGs. The lion’s share of the remainder is black guys with no guitars (think James Brown) and black guys with guitars (think Jimi Hendrix). That leaves only 20% for women of any kind.

Maybe you think Rolling Stone isn’t indicative of American Idol (I disagree, since the judges and producers are of that ilk). Fine, then look at the Billboard charts. As I pointed out previously, about 65% of the Top 10 at any point are men. If we expand to the Top 100, the problem gets far more lopsided in favor of men. It’s not pretty, but it appears to be true.

Yes, it could be that Phil is winning because he’s got a PR blast. He could be winning because the judges are obsequious. He could be winning a pity vote due to his chronic illness. And, maybe he could be winning because of VFTW tomfoolery. But my hypothesis is that Phil is going to win because, in the end, Americans just prefer their singers male, white, and strumming.

May 05 2012

Skylar’s elimination was pretty weird

Look, I’m not going to grouse in this space that America “got it wrong” by keeping Phil and dumping Skylar. America has been getting it wrong for a long time. Yes, it’s capricious and injudicious. That’s boring and obvious.

I’m here to talk about how odd it is that the singer rated #1 on Dialidol, not to mention the only country singer still on the show, was eliminated.

Dialidol’s rare fumbles

How many times has the top rated Dialidol contestant been eliminated? Excluding semi-finals, the answer is that it’s only happened twice before in Dialidol’s 7 years of existence (while Idol itself has been on for 11 seasons, Dialidol only started up in season 4). Melinda Doolittle in season 6 and Lil Rounds from season 8 are the only contestants to hold such a distinction, at least until Skylar. Like I’ve said before, Dialidol is pretty damn good.

Occurences of results for the top and bottom rated Dialidol contestants.

The figure above shows what happened to the person ranked #1 and ranked last on Dialidol. Here is a breakdown of the numbers by percentage

Last in
the rankings
First in
the rankings
Bottom Group 25% 6%
Eliminated
or saved
50% 4%
Safe 25% 90%

People in Phil’s position (dead last on Dialidol) were safe only 25% of the time, and in jeopardy 75%. People in Skylar’s position (highest rated in Dialidol) were safe 90% of the time and eliminated only 4% of the time.

What possible things could be wrong with Dialidol that makes these occurrences ever happen?

I would note that in each of the 3 cases of eliminations, the contestant in question was a woman, which is probably indicative of a small sampling problem with Dialidol’s user base (not that I think they could control it; you have to accept some error in any system). Of those whom Dialidol had ranked #1, five found themselves in the Bottom 3/2 but not eliminated. Three of those five were women (actually, 2 of them were Hollie!), while two were men. On the other side of things, there are the people who are ranked lowest on Dialidol but are safe (not in the Bottom 3/2): this happened 20 times, and only six of those were women. So, yes, Dialidol voters favor females more than the general voting public does.

The bumpkin bump

It certainly was upsetting to a large group of people last year (myself included) that the country musicians seemed to sail through at the expense of better performers. The finale consisting of Scotty McCreery and Lauren Alaina was widely derided and comparatively scarcely watched. The idea was that midwest and southern states voted as a bloc to keep those two in.

I shall have to put together a more detailed argument at a future point, but I’m beginning to believe that that effect is probably overrated. There’s just as much reason to believe that Scotty won because he was a white guy with a guitar, just like the three winners before him, as to believe that he won because of country. Imagine that a large group of voters splits its votes between Lauren and Scotty, while all other votes go to the non-country people. By the end of the contest, you have to reckon that many of those votes would consolidate in a single non-country singer. After all, do you know many people who like country and other stuff? But this didn’t happen.

And this year makes the case against the theory of the “bumpkin bump” even more starkly. Chase Likens, whom I had resigned myself to having to hear for the next 12 weeks during the semifinals, was eliminated immediately, and Skylar at one point switched from country to more pop songs after she found herself in the Bottom 3. There doesn’t seem to be much evidence this year of any country advantage, and other than a few high profile cases like Carrie Underwood I can’t think of too many times where it clearly came into play in the past either.

I would hardly characterize Skylar’s exit as “shocking”. Her songs were mainly so-so, and her personality on the show hasn’t been what one would call dazzling. But it certainly was strange.

Apr 13 2012

Opinion: Jessica Sanchez has been overrated all year

When I said in last night’s projection that I could be persuaded that Jessica could be going home, I meant it. She has most of the same hallmarks as Pia Toscano had last year: beauty contestant type with technically good singing, boring song choices, a plain voice, no musicianship on display, and over-the-top judge reviews.

The overrating even extends to my favorite quantitative variable, the WNTS approval rating. Check this out:

Episode Theme Order Song WNTS Rating WNTS stdev Result DialIdol Rank
Top 25 (Girls) Open 11/12 Love You I Do 84 17 Safe 2
Final 13 Whitney Houston 12/13 I Will Always Love You 91 7 Safe 3
Final 11 Year You Were Born 2/11 Turn The Beat Around 56 22 Safe 2
Final 10 Billy Joel 9/10 Everybody Has A Dream 79 18 Safe 2
Final 09 Personal Idol 6/9 Sweet Dreams 86 13 Safe 8
Final 08 1980s 5/8 How Will I Know 64 21 Safe 4
Final 07 2010s 3/7 Stuttering 78 18 Saved 5

I’ve been insisting in the LiveBlogs that Jessica is off-pitch and a snore, and apparently at least some people agree. But just look at these scores! A 91 for her Whitney Houston copy-cat version of “I Will Always Love You”? Are we being serious? There are only 17 performances out of 1504 that were rated better than that. That puts it in the 98th percentile, equal to Elise Testone’s version of “Whole Lotta Love”. I can tell you which one of those I remember with any fondness, and it ain’t Jessica’s.

Moreover, the scores aren’t significantly polarizing, indicated by the standard deviation of the WNTS Rating. (The standard deviation indicates how different about 60% of the ratings were from the mean score.) They don’t approach the amount of disagreement that viewers had over Adam Lambert, Paul McDonald, Casey Abrams, Megan Joy, or Phillip Phillips. She ranks in the lowest third in that metric.

Why? I suppose the argument that people will make is that “nobody can find anything technically wrong with the performance, so they approve”. So, everyone can agree that they don’t really like it, but there’s nothing wrong with it.

I think that about sums up this season.

If you’re wondering whether Jessica’s trajectory follows Pia’s, the answer is yes, eerily so:

Episode Theme Order Song WNTS Rating WNTS stdev Result  DialIdol Rank
Top 24 (Girls) Open 12/12 I’ll Stand By You 91 11 Safe 5
Final 13 Personal Idol 5/13 All By Myself 82 13 Safe 4
Final 12 Year You Were Born 7/12 Where Do Broken Hearts Go? 76 14 Safe 4
Final 11 Motown 8/11 All In Love Is Fair 73 20 Safe 4
Final 11 (ii) Elton John 4/11 Don’t Let The Sun Go Down On Me 67 20 Safe 4
Final 09 Rock & Roll Hall of Fame 7/9 River Deep – Mountain High 85 17 Eliminated 3

Complete with an identical 91 score in the early rounds, a brief spill into the 60s, and then straight to the bottom of the voting, just one round later. The WNTS rating is simply not a good predictor of this type of contestant. Why? Because the people that they poll are overrating the contestant. I don’t know why they are doing that, and I can’t change it, but I can get angry about it.

Mar 14 2012

The peril of model fiddling

Some people say that we now live in a “data rich” world. You can find a quantitative measurement for practically anything you want. This has been the case at least since the availability of Google search trends, and continues with the advent of things like Twitter searches (including so-called “sentiment analysis”). You can look up many national demographic and economic variables going back at least to the turn of the century, which potentially is very helpful in assessing things like medicine, the justice system, voting patterns in presidential elections, and so on.

Coming along with “data rich”, though, can be “information poor”. The idea that any quantitative variable tells you something meaningful is ludicrous. And it gets worse: even variables which are obviously meaningful can be so noisy that they end up being useless. An example would be the way that unemployment or gas prices affect whether an incumbent president is re-elected. These variables are bound to make voters unhappy with the guy in office; but with so many other things happening in the world, who honestly thinks you can understand all that much about it from this single variable? (Answer: nobody who is well-informed).

How can we assess the voting outcome of American Idol? Well, sit and think of what you would vote based on. How good was the singing? Naturally. Was the person in some other way annoying? Certainly that would make a difference. Was this week’s performance a dud, but last week’s really good? Maybe you’ll vote anyway based on that. Now sit and try to think of how other people might vote. Hawaiians might vote for this guy because he’s from Hawaii. Young girls would like this cute guy even though he sucks. Country fans will like this. Worsters are pulling for this guy. The list is endless.

For all the crap they take, weather forecasts and presidential polling forecasts are extremely accurate. The fact that you can take such a chaotic system, reduce it down to a few variables, measure those variables, and build a model that is accurate even a fair fraction of the time is astounding. And weather forecasts are a good deal more accurate than even a fair fraction of the time. Presidential polls have been accurate in the final days of an election in all years since Dewey lost to Truman. That’s pretty incredible.

What accounts for the success of these predictions? In weather, the advantage is in massive amounts of data. You have every possible indicator on every day for the past hundred years. Given that, you are bound to be able to say with considerable precision whether something will happen. Suppose the weather forecast calls for a 30% chance of rain. That means that in all of the data, when the indicators looked like that, it rained 30% of the time. If you collect data for a month, that will be pretty crappy. Barometric pressure, humidity, time of year, all of those vary too much to tell you. But looking at tens of variables over all years? That will be very good.

In political polling the advantage is that you’re measuring the actual thing that’s happening. You straight up ask people who they are going to vote for. Of course, this could still be faulty, and it changes a lot over time. But in the days before an election you can see that the numbers sort of “lock in”, as people make up their minds, and the predictions become nearly bulletproof based on these data. This was most staggeringly clear when Nate Silver predicted the number of electoral votes in 2008 within a very small fraction of votes.

These two strategies are useless for American Idol. The weather example fails because there have only been 10 full seasons of the show. If the show continued for another 100 years (not likely), then you would start to be able to make some damn good predictions. The polling example fails because there is no way to poll how people are voting during a 2 hour block of time during which voting happens which comes directly after the show. Even if you had the resources (money) to do it, it’s not practicable. You can do something like exit polls, which is how Votefair works, but this is fraught with sampling error, meaning that the people who vote on Votefair are not representative of those who vote in total.

Here are some variables that we could try to collect to help us predict:

  • Approval rating of performance by WhatNotToSing.com
  • Dialidol score
  • Votefair votes
  • Performance position (when did the performance take place?)
  • Aggregate blogger predictions
  • Pre-exposure time
  • Gender
  • Other polls (such as Ricky.org or IdolBlogLive)

Then we could think of some other attendant or derived variables. Maybe WNTS approval rating and last week’s approval rating. Maybe not just Dialidol score, but Dialidol raw votes, or Dialidol ranking. Maybe we take the performance position and extract a “performance order quotient” that fit the observed elimination rates. Again, the possibilities are endless.

Here’s the thing: it’s obvious that all of these variables are indicative of the outcome. If everyone says that Jeremy Rosado is gone, then he’s probably gone. But that isn’t what happened! The only indicator I saw that Elise Testone was the lowest vote getter among the women was that MJ from mjsbigblog thought that it was going to happen, and that’s just one person’s feeling, not any kind of measurement.

Let’s dig into this a little more. Here are the relevant variables from last week

Contestant Order Song WNTS Rating WNTS stdev Result Dialidol score Dialidol Votes (thousand)
Joshua Ledet 1 I Wish 70 14 Bottom Group 1.641 2.37
Elise Testone 2 I’m Your Baby Tonight 33 17 Bottom Group 2.994 1.74
Jermaine Jones 3 Knocks Me Off My Feet 40 19 Bottom Group 0.82 1.31
Erika Van Pelt 4 I Believe In You And Me 73 11 Bottom Group 1.377 5.19
Colton Dixon 5 Lately 51 17 Safe 2.719 4.26
Shannon Magrane 6 I Have Nothing 23 17 Bottom Group 1.836 1.54
Deandre Brackensick 7 Master Blaster 56 22 Safe 4.247 1.48
Skylar Laine 8 Where Do Broken Hearts Go 78 16 Safe 3.572 2.65
Heejun Han 9 All In Love Is Fair 45 20 Safe 3.055 1.38
Hollie Cavanagh 10 All The Man That I Need 88 9 Safe 3.987 3.86
Jeremy Rosado 11 Ribbon In The Sky 32 19 Eliminated 3.135 .53
Jessica Sanchez 12 I Will Always Love You 91 7 Safe 4.041 4.28
Phillip Phillips 13 Superstition 60 25 Safe 4.66 4.08

So, the image is very muddy. Elise Testone ranks higher than (though within the margin of error of) Shannon Magrane on WNTS, and Dialidol records a higher score, both in votes and in computed score. And she ends up being the lowest, but maybe we just say that it was too close to call. Fine. But let’s now look at the 3rd lowest vote getter, Erika Van Pelt. She has the 4th highest WNTS score among women exactly where she should have been. Her Dialidol score is lower than either Shannon or Elise. But her raw number of Dialidol votes is the highest of all!

Now let’s move on to the men. The lowest WNTS among men was Jeremy. Good. Next lowest is Jermaine. Good! After that comes … Heejun Han, Colton Dixon, Deandre Brackensick, and Phillip Phillips. Only after those 4 guys do we see the actual third member of the bottom 3 men, Joshua Ledet. Exactly what Dialidol said!!!

This isn’t very surprising. Both WNTS and Dialidol are sampling, and they sample different subsets of the people who vote on American Idol. The people who blog about Idol and get polled by WNTS are different both in actual identity and in demographics than the people who install Dialidol and use their modem to power-vote. We should expect them to diverge and we should expect that they are both right in certain respects. Maybe Dialidol is simply a better gauge of men’s performances. Maybe it’s that there tend to be more overall votes for men anyway, so the sampling error is somewhat minimized.

Or, most plausible of all, maybe the data is just fricken noisy.

Now, to try to assess the noise, you do a regression analysis. That’s just a term that means that you take events that happened previously, you look at the outcome versus the variables you’re interested in, and then you guess at a shape of that dependence. Then you try to fit the shape to the data. You then plug your new data in (what happened this week) and see what the outcome should be based on this curve that you fit. Suppose I load all the data from the Top 12 and Top 13 from seasons 2 to 10 (season 1 started with Top 10). I can look at the outcomes (either eliminated or not eliminated) versus some of these variables. First is plain old WNTS score.

Call:
glm(formula = Result ~ WNTS.Rating, family = binomial(link = "logit"))

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.29195  -0.42097  -0.22532  -0.07682   2.38317  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  0.66594    0.76767   0.867  0.38568   
WNTS.Rating -0.08013    0.02445  -3.277  0.00105 **

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 66.828  on 108  degrees of freedom
Residual deviance: 48.937  on 107  degrees of freedom
AIC: 52.937

Number of Fisher Scoring iterations: 7

This most important thing here is the “significance”, under “Pr(>|z|)” which tells you what the probability is that your variable is not statistically meaningful. In this case, my stats program is telling me that WNTS.Rating has probability 0.00105 of not being significant, so that there is only about a 1% chance of that being the case. ok, yes, song quality affects whether or not people are eliminated. Duh.

Now, suppose that we try Dialidol. Why Dialidol? Well, I said that to do the regression analysis you need historical data, and Dialidol has been around quite a long time. Votefair has not. Twitter has not. It also has a decent track record. In any case, there’s no harm in trying it. Let’s try with the Dialidol ranking (where 1 is the highest number of votes recorded and 13 is the lowest)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  -7.7901     2.7310  -2.852  0.00434 **
DIVRank       0.6085     0.2529   2.406  0.01613 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

So vote ranking by Dialidol is also statistically significant. Again, of course it is: you’re measuring while people are actually voting!

Now, we can actually fit both of these variables at the same time,  so that we have a model that takes both things into account:

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept) -5.26410    3.21264  -1.639   0.1013  
WNTS.Rating -0.07096    0.03809  -1.863   0.0625 .
DIVRank      0.59675    0.29605   2.016   0.0438 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This is good. Both variables are significant, at least as well as can be hoped. So now let’s plug in the WNTS Rating and the Dialidol voting rank for last week’s contestants, and then use the regression that we just fit to say how likely elimination was:

            Contestant Sex WNTS.Rating DIVRank  Prob
1        Jeremy Rosado   M          32      13 44.70
2       Jermaine Jones   M          40      12 22.58
5      Shannon Magrane   F          23       9 14.38
3           Heejun Han   M          45      11 10.53
6        Elise Testone   F          33       8  4.48
4  Deandre Brackensick   M          56      10  2.94
7         Joshua Ledet   M          70       7  0.19
11        Colton Dixon   M          51       3  0.07
10    Phillip Phillips   M          60       4  0.06
8         Skylar Laine   F          78       6  0.06
9      Hollie Cavanagh   F          88       5  0.02
13      Erika Van Pelt   F          73       1  0.00
12     Jessica Sanchez   F          91       2  0.00

To look at this, you might say that this isn’t too bad a projection. 4 out of the 5 highest probability contestants were indeed in the bottom 3. However, Joshua Ledet is out of order and Erika Van Pelt is way way out of order. The result is a bad projection on that end, predicting an event as 0.00% chance when in fact Erika was in the bottom 3 girls. That should not have happened.

So what went wrong? Nothing went wrong! All of this is totally intellectually sound. It was the data that were funky. A variable which was a good predictor in past years wasn’t this year, at least as far as Erika is concerned.

It is very tempting to start fiddling with the model. Erika is clearly being oversampled by Dialidol, on account of a small number of users who power vote only for her. You could just build in a mechanism to dock Erika some number of Dialidol votes. Likewise Heejun should get a bonus. Then you can get everything to line up nice. However, on what intellectual basis have you done this? Why adjust some values but not others? Why not adjust the WNTS score? Maybe the bloggers are the ones that rated it too highly. The answer is that there is no real intellectual basis. Since we don’t have any of these variables separated, we can’t tell anything about them.

So, I’m not going to screw with the model. It is just going to get some things very wrong. Seeing the underlying data may help explain to you why this is. This is certainly a situation of “information poor”, and it doesn’t look to get any richer anytime soon. By not adjusting the model, though, this actually makes it more robust, not less, since the model stays simple and doesn’t start making a bunch of ad hoc assumptions about what is going on.

Mar 12 2012

Huh…

So remember Haley Johnsen, the super-pretty ex-barista (or maybe she’s a barista again, who knows) who sang a truly execrable rendition of Sweet Dreams, failing to make it into the Top 13? Well apparently she wanted to do a remixed version that actually doesn’t suck:

(I really don’t recommend you go back and watch what she actually performed for comparison; it’s really painful to watch).

I wonder why they wouldn’t let her do it like that? Maybe the musicians couldn’t or wouldn’t learn a new version of the song with twenty-four other people to accomodate? I don’t know, but maybe if Haley had done that version…?

Then again, maybe not. In any case she never would have won; Hollie Cavanagh and Jessica Sanchez are much better prospects, not to mention my personal favorite, Erika Van Pelt. And that’s just the girls; as we’ve previously demonstrated, there is definitely a statistically significant gender bias in American Idol.

And credit where credit’s due, I saw this first linked to from mjsbigblog

Older posts «