Effects of Supervote rule change are murky

vote-image

Today it was announced that Idol would get a new voting feature, called the “supervote”. As I understand it, rather than voting 50 separate times on Facebook or American Idol’s website, you can cast 50 votes all at once, dividing them among your favorites in any way you want.

I think it’s pretty clear that this is going to weight the voting more in the direction of online voting. Rather than having to sit and press vote and fill in a CAPTCHA each time, the work to cast 50 votes (which was the maximum allowed last year online) has just been made tiny by comparison. People who were likely only to cast 10 votes will almost certainly cast the entire 50 now. This makes phone or text voting much less appealing.

However, I’m not sure that this makes much of a difference. In an age where Twitter hashtags appear on television screens ubiquitously and 150 million Americans use Facebook, surely a lot of voting was going on by web. With a sample size that big, it’s hard to imagine that such a rule change would significantly change the demographics by much, and hence affect the voting trends.

That being said, I am concerned that it could drastically throw off the accuracy of Dialidol. Dialidol’s service works by measuring how many busy signals are on the line when voting, and using an empirical method to predict the result. Even if the number of phone votes stays the same, the number of online votes is apt to rise by a factor of between 5 and 10, since who would vote 10 times if one could vote 50 times just as or more easily? Dialidol appeared to weather the shift of many voters to text messages, maintaining fairly good accuracy, but 50 votes with a couple clicks is a huge number.

From an electoral perspective, it’s interesting to find out what effect the supervote policy has on vote splitting. It’s well known that with three candidates, two of which are similar, votes will be split between the two similar candidates leading to election of the more heterogeneous candidate. The most obvious example was the election of Bill Clinton, who benefited greatly by Ross Perot taking away part of George H. W. Bush’s votes. That was an election with 1 voter, 1 vote. In American Idol, depending on how onerous the requirements are for voting, there may be a de facto barrier similar to the legal barrier that American elections have. That is, if it’s possible, but annoying, to cast many votes, then some people won’t cast them, and vote splitting is important. However, with a low barrier to multiple voting, vote splitting could become less important.

Consider the Top 3 of Season 6. Jordin Sparks, Blake Lewis, and Melinda Doolittle were vying for the two spots in the finale. Jordin and Melinda were fairly similar, in that they were black women who sang torch songs or R&B. Blake, meanwhile, was an offbeat singer, prone to electronica. When it was just between Jordin and Blake, Jordin won. But in a three way race, votes from people who liked black soul female singers may have split votes between Jordin and Melinda, making a Jordin and Melinda finale much less likely.

On the flip side, if everyone votes the full 50 times for their favorite person, this rule may just lead to vote inflation, with no overall changes to the general milieu. This is an empirical question, not one that someone can divine the answer to.

Why all models are wrong

“All models are wrong”, my PhD adviser told me one day. We were having a conversation about why a given experiment of ours was off compared to the predictions of another group. He smiled at me, but I knew, realized, right away that he was correct in a certain sense.

This seems to be a point that isn’t appreciated.

Suppose I were to write down Newton’s Law of gravitation. That law is correct (up to a small deviation very near the Sun due to General Relativity). I could then build a model of the orbit of the Earth around the Sun, which predicts an elliptical path. This is a good model, much better than even the Copernican model, probably one you learned about in school. But it’s wrong.

Why is it wrong? Because it was treated as a two-body system, neglecting the effect of all the other planets, asteroids, comets, of the micrometeoritic material around the Earth, etc. One can then try to add those factors in, which is a Herculean task necessitating tons of observations and computing time. You improve the model. Then, an asteroid comes up, and someone asks you if there will be a collision. As an example, the meteor designated 2011 AG5 will cross our path sometime between 2040 and 2047; NASA lists the probability of impact at 0.2%.

You could fairly ask why NASA would have to list a mere probability of impact, and not tell us definitely what will happen. The answer is that it’s because their response is based on a model, that model has statistical noise in its variables, and it’s imperfect in conception. It’s wrong.

In weather forecasts, often a probability of rain is listed. Why? Why can’t they just tell you whether or not it will rain? What does it even mean that there’s a 30% chance of rain? It means that 30% of the time when the conditions looked like today, it ended up raining. Is the person wrong if it rains? No. He said there was a chance: 30%. If he said there was a 0% chance, he would be wrong. By saying there was a 30% chance he is being honest about the uncertainty in his statistical analysis and weather modeling capability. His model is wrong.

An actuary is a person who works at an insurance company. He makes sure that the premiums collected on life insurance are enough to cover the losses due to people with policies who die. How does he know how many are going to die? Can he predict who is going to die? In a certain sense, yes. He builds a model very much like the one that I’ve built for this blog. He collects a data set of 10,000 people, looking at various variables: age, weight, gender, smoker/nonsmoker, miles driven per day, for instance. Then he builds a model that predicts the probability of someone dying in the next year under all of those conditions. If many of the people stop smoking, this will reduce the number of deaths. The premiums for those people can then be lowered.

But now let’s take a small subset of people, say 3 of them. Bob is 55, still smokes (though he’s tried to cut back), is a bit overweight and drives 20 miles a day. John is 20, nonsmoker, thin, and drives 50 miles a day. Alice is 40, nonsmoker, in good shape, and rides the subway.

The actuary’s boss brings these people into a room and asks the actuary which will die first. The actuary uses the model he’s built from the 10,000 data points and predicts a probability of dying. Bob is the most likely, as he’s male, older, and smokes, so he’s a prime candidate for heart disease. Next is John, who is a young man, and young men die in traffic collisions far more than 40 year old women. Finally, Alice has the lowest probability of dying. If the boss says that one of them will die, the actuary would make a table:

Name Age Sex Smoker? Miles/day
driven
BMI Probability
of death
Bob 55 M Y 20 27 56%
John 20 M N 50 20 27%
Alice 40 F N 0 22 17%

Later that year, John wraps his car around a tree, dying. The actuary’s boss comes into his office, screaming mad that the actuary was wrong. “But”, says the actuary, “I was not wrong. I said that Bob was the most likely to die, not that he would definitely die”.

Was the model wrong? Yes. Why? Because it didn’t take into account every single thing about the entire world and how it affected these three people. That is impossible. Instead, what it did was reduce their lives to a few quantitative variables and project what the likelihood of death was, and said that Bob was the most likely to die. But the actuary was not wrong. He gave the honest probability as well as anybody could possibly have determined. If the man’s boss interpreted the table above as saying “Bob will definitely die”, then that is his problem.

Surely the reader can understand what I am getting at by now.

Contestant WNTS
Rating (avg.)
Dialidol Rank Bottom 3 Previously? Previous Rating Probability of
Elimination (%)
Jessica Sanchez 45.7 2 Yes 74.5 45.8
Joshua Ledet 54.7 2 Yes 69.0 31.7
Phillip Phillips 61.0 1 No 63.5 22.4

This was my forecast on last Thursday. Right there it says that Jessica Sanchez was the most likely to be eliminated. She was ranked 3rd on Dialidol. People eliminated in the Top 3 are usually ranked 3rd on Dialidol (though I bumped her to a tie because of how close the numbers were). She had the lowest WNTS approval rating of the night. The person eliminated in the Top 3 typically has a lower WNTS approval rating than the others. As such, the model predicted a 46% chance of her being eliminated. For Joshua, he was ranked second on Dialidol and had the second-lowest WNTS rating. Sometimes those people are eliminated, and the probability is reckoned to be about 32%. This leaves Phil with a 22.4% chance.

Jessica Sanchez was not eliminated, and Joshua was. The one with the highest probability was not eliminated. My question is: so what? A 32% chance is not small, not even remotely surprising. Twice in the past week here in Atlanta there was a 30% chance of rain, and it rained. So what?

The model that I use is wrong. It reduces all of the possible effects on voting into just two variables, which is not correct. It is, however, feasible, and the most intellectually sound one I can find. If it gets a fair amount during the entire season, then I’m happy. If it misses any one particular one, that is totally meaningless.

The model is a formula. It’s not racist. It’s not sexist. It’s not based on falsehoods. It’s a straightforward correlation and logistical regression. It’s dispassionate, reductive, and wrong. However, it is not totally wrong. Sometimes it hits, and sometimes it misses, and it’s been better than some experts out there predicting what will happen. That’s more than I expected.

I don’t have skin in this game. I don’t care for any of the contestants, really. The last Idol contestant that I really liked was Blake Lewis. I kind of liked Erika Van Pelt. What the model “thinks” is not always what I think, nor is it what I want. To be clear, I would prefer if Jessica won over Phillip, if for no other reason that I think women have been badly treated in the past 4 years. I’ve remarked many times in the Liveblog that Phil isn’t singing what I recognize to be notes.

It would actually be much easier for me to just sit back and not make predictions. The reason I built a model was to see if it was possible to do a prediction of who’s eliminated from just a few statistical indicators. That model makes a lot of calls, and a lot are wrong. But there are some that are right, and there are many that are in the ballpark. If you get some enjoyment or information out of reading them, that’s great. If they’re wrong, you should take that as a sign that 1. the data is noisy and sparse, 2. things with the voting change, and 3. there are many factors that are not quantifiable. Any comments beyond that are pointless, and the unlettered attacks flying around in comments sections of this blog tell me that these points haven’t been communicated well.

America loves White Guys With Guitars

It was apparently news today that White Guy With Guitar (WGWG) winner #2, Kris Allen of season 8, thinks that another WGWG will win this year. He’s referring to Phil Phillips, who would indeed by the fifth such winner in 5 years. I happen to agree with Kris Allen that Phil will probably win. That is not an endorsement, just a thing I think is true.

You know what else is true? It isn’t just American Idol voters: Americans in general like WGWGs.

Take any metric you like. Above is the demographic breakdown of the Rolling Stone Top 100 Artists. About half of them are WGWGs. The lion’s share of the remainder is black guys with no guitars (think James Brown) and black guys with guitars (think Jimi Hendrix). That leaves only 20% for women of any kind.

Maybe you think Rolling Stone isn’t indicative of American Idol (I disagree, since the judges and producers are of that ilk). Fine, then look at the Billboard charts. As I pointed out previously, about 65% of the Top 10 at any point are men. If we expand to the Top 100, the problem gets far more lopsided in favor of men. It’s not pretty, but it appears to be true.

Yes, it could be that Phil is winning because he’s got a PR blast. He could be winning because the judges are obsequious. He could be winning a pity vote due to his chronic illness. And, maybe he could be winning because of VFTW tomfoolery. But my hypothesis is that Phil is going to win because, in the end, Americans just prefer their singers male, white, and strumming.

Skylar’s elimination was pretty weird

Look, I’m not going to grouse in this space that America “got it wrong” by keeping Phil and dumping Skylar. America has been getting it wrong for a long time. Yes, it’s capricious and injudicious. That’s boring and obvious.

I’m here to talk about how odd it is that the singer rated #1 on Dialidol, not to mention the only country singer still on the show, was eliminated.

Dialidol’s rare fumbles

How many times has the top rated Dialidol contestant been eliminated? Excluding semi-finals, the answer is that it’s only happened twice before in Dialidol’s 7 years of existence (while Idol itself has been on for 11 seasons, Dialidol only started up in season 4). Melinda Doolittle in season 6 and Lil Rounds from season 8 are the only contestants to hold such a distinction, at least until Skylar. Like I’ve said before, Dialidol is pretty damn good.

Occurences of results for the top and bottom rated Dialidol contestants.

The figure above shows what happened to the person ranked #1 and ranked last on Dialidol. Here is a breakdown of the numbers by percentage

Last in
the rankings
First in
the rankings
Bottom Group 25% 6%
Eliminated
or saved
50% 4%
Safe 25% 90%

People in Phil’s position (dead last on Dialidol) were safe only 25% of the time, and in jeopardy 75%. People in Skylar’s position (highest rated in Dialidol) were safe 90% of the time and eliminated only 4% of the time.

What possible things could be wrong with Dialidol that makes these occurrences ever happen?

I would note that in each of the 3 cases of eliminations, the contestant in question was a woman, which is probably indicative of a small sampling problem with Dialidol’s user base (not that I think they could control it; you have to accept some error in any system). Of those whom Dialidol had ranked #1, five found themselves in the Bottom 3/2 but not eliminated. Three of those five were women (actually, 2 of them were Hollie!), while two were men. On the other side of things, there are the people who are ranked lowest on Dialidol but are safe (not in the Bottom 3/2): this happened 20 times, and only six of those were women. So, yes, Dialidol voters favor females more than the general voting public does.

The bumpkin bump

It certainly was upsetting to a large group of people last year (myself included) that the country musicians seemed to sail through at the expense of better performers. The finale consisting of Scotty McCreery and Lauren Alaina was widely derided and comparatively scarcely watched. The idea was that midwest and southern states voted as a bloc to keep those two in.

I shall have to put together a more detailed argument at a future point, but I’m beginning to believe that that effect is probably overrated. There’s just as much reason to believe that Scotty won because he was a white guy with a guitar, just like the three winners before him, as to believe that he won because of country. Imagine that a large group of voters splits its votes between Lauren and Scotty, while all other votes go to the non-country people. By the end of the contest, you have to reckon that many of those votes would consolidate in a single non-country singer. After all, do you know many people who like country and other stuff? But this didn’t happen.

And this year makes the case against the theory of the “bumpkin bump” even more starkly. Chase Likens, whom I had resigned myself to having to hear for the next 12 weeks during the semifinals, was eliminated immediately, and Skylar at one point switched from country to more pop songs after she found herself in the Bottom 3. There doesn’t seem to be much evidence this year of any country advantage, and other than a few high profile cases like Carrie Underwood I can’t think of too many times where it clearly came into play in the past either.

I would hardly characterize Skylar’s exit as “shocking”. Her songs were mainly so-so, and her personality on the show hasn’t been what one would call dazzling. But it certainly was strange.

Opinion: Jessica Sanchez has been overrated all year

When I said in last night’s projection that I could be persuaded that Jessica could be going home, I meant it. She has most of the same hallmarks as Pia Toscano had last year: beauty contestant type with technically good singing, boring song choices, a plain voice, no musicianship on display, and over-the-top judge reviews.

The overrating even extends to my favorite quantitative variable, the WNTS approval rating. Check this out:

Episode Theme Order Song WNTS Rating WNTS stdev Result DialIdol Rank
Top 25 (Girls) Open 11/12 Love You I Do 84 17 Safe 2
Final 13 Whitney Houston 12/13 I Will Always Love You 91 7 Safe 3
Final 11 Year You Were Born 2/11 Turn The Beat Around 56 22 Safe 2
Final 10 Billy Joel 9/10 Everybody Has A Dream 79 18 Safe 2
Final 09 Personal Idol 6/9 Sweet Dreams 86 13 Safe 8
Final 08 1980s 5/8 How Will I Know 64 21 Safe 4
Final 07 2010s 3/7 Stuttering 78 18 Saved 5

I’ve been insisting in the LiveBlogs that Jessica is off-pitch and a snore, and apparently at least some people agree. But just look at these scores! A 91 for her Whitney Houston copy-cat version of “I Will Always Love You”? Are we being serious? There are only 17 performances out of 1504 that were rated better than that. That puts it in the 98th percentile, equal to Elise Testone’s version of “Whole Lotta Love”. I can tell you which one of those I remember with any fondness, and it ain’t Jessica’s.

Moreover, the scores aren’t significantly polarizing, indicated by the standard deviation of the WNTS Rating. (The standard deviation indicates how different about 60% of the ratings were from the mean score.) They don’t approach the amount of disagreement that viewers had over Adam Lambert, Paul McDonald, Casey Abrams, Megan Joy, or Phillip Phillips. She ranks in the lowest third in that metric.

Why? I suppose the argument that people will make is that “nobody can find anything technically wrong with the performance, so they approve”. So, everyone can agree that they don’t really like it, but there’s nothing wrong with it.

I think that about sums up this season.

If you’re wondering whether Jessica’s trajectory follows Pia’s, the answer is yes, eerily so:

Episode Theme Order Song WNTS Rating WNTS stdev Result  DialIdol Rank
Top 24 (Girls) Open 12/12 I’ll Stand By You 91 11 Safe 5
Final 13 Personal Idol 5/13 All By Myself 82 13 Safe 4
Final 12 Year You Were Born 7/12 Where Do Broken Hearts Go? 76 14 Safe 4
Final 11 Motown 8/11 All In Love Is Fair 73 20 Safe 4
Final 11 (ii) Elton John 4/11 Don’t Let The Sun Go Down On Me 67 20 Safe 4
Final 09 Rock & Roll Hall of Fame 7/9 River Deep – Mountain High 85 17 Eliminated 3

Complete with an identical 91 score in the early rounds, a brief spill into the 60s, and then straight to the bottom of the voting, just one round later. The WNTS rating is simply not a good predictor of this type of contestant. Why? Because the people that they poll are overrating the contestant. I don’t know why they are doing that, and I can’t change it, but I can get angry about it.