Top 9 post-game

Name Song WNTS DialIdol VoteFair Not-safe Probability Accurate?
Paul Jolley Eleanor Rigby 39 1.535 1 0.481 Yes, Eliminated
Burnell Taylor Let It Be 51 1.776 2 0.445 No (safe)
Lazaro Arbos In My Life 11 0.951 7 0.432 No (safe)
Devin Velez The Long And Winding Road 50 2.732 3 0.420 No call
Janelle Arthur I Will 69 0.371 7 0.363 Yes (safe)
Amber Holcomb She’s Leaving Home 54 0.493 9 0.352 No (bottom 3)
Kree Harrison With A Little Help From My Friends 77 1.170 18 0.217 Yes (safe)
Candice Glover Come Together 80 1.869 19 0.201 Yes (safe)
Angie Miller Yesterday 68 5.993 34 0.089 Yes (safe)

Paul Jolley was predicted eliminated, and that happened, so if that’s all you care about, great.

Otherwise, it was kind of a dismal night for predictions. Lazaro and Burnell showed they have more staying power than their numbers would suggest. Amber Holcomb was in the bottom 3, which was a bit of a surprise. Clearly Dialidol is underrating Lazaro, and Votefair is overrating Amber, relative to the voting public.

I find it disturbing that any of these women could be in the bottom 3 compared to these men.

I’ve noticed a couple weird things. Votefair seems to be polling a relatively small number of people (440 in this case), which is down markedly from last year (the top 9 last year had 707 votes). I don’t know whether that site is becoming less popular in general, or whether these contestants aren’t inspiring the kind of rabid fanbase that votes in such online polls, but it’s noteworthy. Also, Dialidol’s numbers at around midnight eastern time don’t change from then until the morning. Is Dialidol not counting any votes after 9pm PST? If that’s true, I can’t fathom why. Since I’m not active on their forum, I can’t easily see whether something has changed.

Decent showing for the model in the Top 10

Name Song WNTS DialIdol VoteFair Not-safe
Probability
Accurate?
Curtis Finch I Believe 24 0 2 0.407 Yes (eliminated)
Janelle Arthur Gone 43 0.544 2 0.383 No (6th)
Paul Jolley Amazed 47 0.563 3 0.367 Yes (bottom 3, 8th)
Burnell Taylor Flying Without Wings 50 0 4 0.358 No call
Devin Velez Temporary Home 46 0.563 6 0.336 No call
Lazaro Arbos Breakaway 31 5.171 5 0.326 Yes (4th)
Amber Holcomb A Moment Like This 71 0.139 10 0.276 Yes (5th)
Kree Harrison Crying 64 0.515 14 0.244 Yes (Top 3)
Candice Glover I (Who Have Nothing) 89 0.417 14 0.226 Yes (Top 3)
Angela Miller I Surrender 68 3.266 41 0.078 Yes (Top 3)

When I designed the finals model, I built it to do several things. My first priority was that it be based on sound principles, and not just some overfit, ad-hoc mess. The second was that it be as accurate as possible on ranking. The natural outcome of this was to produce probabilities of being in the bottom group (typically the bottom 3), and then tweak the coefficients to produce the fewest ranking errors.

Finally, I didn’t want the model to be wishy-washy—I wanted it to take firm positions. I get kind of annoyed with sites like Dialidol that publish predictions where all the names are in yellow (meaning that Dialidol makes no official pronouncement). If you’re going to make a forecast, it should god damn well forecast something (Dialidol last night predicted that each contestant would be in the range from 1 to 10 …). The flip side of that is that your model can be totally wrong.

The model had a pretty good night. It got the person eliminated correct, it got 2/3 of the bottom 3, and it ranked 5 of the contestants as the best, and those 5 were the best. Angela, Candice, and Kree were predicted as the Top 3, and they were, though we don’t know whether it ranked them correctly relative to each other. Amber and Lazaro were out of order, but both in the correct group; I ain’t mad at it.

Devin Velez was one of the two that the model couldn’t call, and he was way out of order. Devin and Janelle should have switched places. Janelle was ranked 9th, but was actually 6th. You can’t win em all, but that is an irritating black spot in an otherwise bright record.

I’m intrigued by the amount of information we’re being given this year, but I don’t think that I can make much use of it. Without a historical perspective, it’s difficult to say what the rankings imply about the future. Can someone go from being in the top 3 vote-getters in the Top 10 to being 9th place in the Top 9? Who knows? We’ve never seen the results, so we don’t know how volatile the voting is. I do find all the new information a nice change of pace, though.

As an editorial note, I’m not sure how VoteForTheWorst is going to work this year. Lazaro was their pick, and while I may agree, is he really that bad? I don’t think he is. This Top 10 isn’t a lot of grist for their mill, I’m afraid.

Why the model predicted the Top 10 correctly

The debut of the new semifinals model went off without a hitch. The model chose 100% correctly, somewhat better than I had hoped. I’d like to just go over how the model thought things went down.

Below is reprinted the Men’s prediction:

Name Pre-exposure
(seconds)
Audition WNTS DialIdol VoteFair Probability of
advancing
Lazaro Arbos 1263 Yes 49 3.559 18 0.893
Devin Velez 524 No 71 4.693 17 0.856
Burnell Taylor 969 Yes 74 0 13 0.804
Curtis Finch 893 Yes 48 4.661 10 0.730
Paul Jolley 859 Yes 52 0.984 10 0.591
Charlie Askew 1115 Yes 9 5.641 12 0.575
Vince Powell 585 Yes 45 0 10 0.403
Nick Boddington 657 No 58 0.146 5 0.126
Cortez Shaw 569 No 33 0 2 0.012
Elijah Liu 351 No 42 0 4 0.010

As I wrote that night, the most probable Top 5 Men consisted of Lazaro, Devin, Burnell, Curtis, and Paul. At the beginning of the night, Charlie Askew occupied the 5th spot, with Paul in 6th, but with an update to the final Dialidol and Votefair numbers, that was reversed.

Notice that the various indices were divergent. WhatNotToSing ranked Nick Boddington as third best, but Dialidol showed him just above zero (9th place), and he was also 9th place according to Votefair. That, along with the fact that his audition was not shown, made him an unlikely finalist.

Dialidol, meanwhile, ranked Charlie Askew as being in first place. While this is often a strong indicator, it is less so in the semifinals. Last year, for instance, Dialidol thought Eben Frankewitz was a lock. But WNTS gave Charlie a 9 out of 100, dead last by a large margin, and Votefair showed him only as 4th most popular. Viewed in the aggregate, Charlie’s numbers were weak. That being said, the model, in my opinion, got lucky with this call. It could have easily gone to Charlie instead of Paul.

Lazaro was predicted most likely to advance based on decent to strong numbers on all indices. He was fourth on Dialidol, first on Votefair, and fifth on WNTS. Couple that with a high amount of pre-exposure, and the model assigned him a very high probability of advancing. That he was revealed last was possibly an indicator that he was way out ahead of the pack. At this point, he might be considered a front-runner among the men.

Finally, Burnell Taylor had a 0 on Dialidol. This was the kiss of death for the others with the same (Vince Powell, Cortez Shaw, and Elijah Liu). But Burnell had something those guys didn’t: he was third most popular according to Votefair and had the top WNTS score of the night. That, along with the pre-exposure considerations, made him favored to be included.

Now the women:

Name Pre-exposure
(seconds)
Audition WNTS DialIdol Votefair Probability of
advancing
Angela Miller 904 Yes 75 5.618 45 0.936
Candice Glover 912 Yes 88 3.371 9 0.830
Kree Harrison 723 No 80 2.333 13 0.650
Amber Holcomb 355 No 73 1.457 7 0.447
Janelle Arthur 923 Yes 49 0.131 3 0.446
Adriana Latonio 375 No 30 6.331 9 0.418
Breanna Steer 390 Yes 46 1.920 2 0.412
Tenna Torres 756 Yes 31 1.638 2 0.350
Zoanette Johnson 1277 Yes 7 1.899 8 0.343
Aubrey Cleland 363 No 52 0.223 4 0.169

If Lazaro can be considered the front-runner among men, I think it’s fair to say that Angela (Angie) Miller is the same among women. In fact, she may be, at this point, considered the front-runner of the contest. Her ranking among the indicators was 3rd on WNTS, 2nd on Dialidol, and 1st to a large degree on Votefair. This should be closely watched, because Votefair has severely overrated contestants in the past, such as Jessica Sanchez. But assuming it’s not being juked by rabid fans, Angela seems to be positioned well.

Candice Glover was another cinch for the Top 10. Third on Dialidol, tied for third on Votefair, and with the top WNTS score of the night, her advancement was not doubtful. I retain a bit of skepticism as to her staying power, as some similar singers, such as Mandisa, started strong but didn’t go the distance.

The model was not confident enough to call the contest for Amber and Janelle instead of Adriana and Breanna. Their probabilities were below the threshold where most errors occur. Adriana Latonio had a very strong Dialidol score (both of Dialidol’s picks for the top spot were wrong), but a poor WNTS score and a small amount of pre-exposure.

Janelle squeaked by according to the model, benefiting from particularly low WNTS scores for her competitors. She had weak numbers on Dialidol and Votefair, and she is at a disadvantage going into the finals. Without some kind of game changer, look for her speedy departure.

When I watched the show, I had thought Aubrey Cleland was going to skate by, but the model correctly repudiated my view. She showed no real sign of support on Dialidol and Votefair, and had a decent but unspectacular WNTS score.

Finally, we come to Zoanette. I’ve been saying throughout these rounds that I thought Zoanette had no chance with the voters. She was a goof, much like Normand Gentle, more a laugh-at-her-not-with-her contestant than a real contender. The judges may have tolerated that, even venerated it, but my feeling was that the audience would have little patience for it, and I guessed right. Other than the fact that she had a lot of screen time in the auditions, there was no reason in the numbers to think she would advance.

The three positions that the model was not confident on could, of course, have gone another way. Probabilities don’t imply any kind of certainty, and the most probable events frequently don’t occur. Thus, inasmuch as I use that as an excuse when the model is wrong, I must point out the 100% accuracy of the model this year was a bit of a fluke. It could easily have been only 70% accurate.

The model represents conventional wisdom, in my mind, and as such this has been a very conventional year, so far. There were no real surprises, or anything that makes you just shake your head and wonder how it happened. That’s good for someone trying to predict it, but it’s not necessarily good for the show.

Top 4 Updated Predictions

Update: Dial Idol rankings were updated after 1am EST and now have Jessica Sanchez as their #1 vote getter. The projections have been updated accordingly.

Contestant WNTS Rating (avg) Dialidol Rank Previous Rating Probability of Elimination (%)
Joshua Ledet 69 4 72.5 40.67
Phillip Phillips 63.5 3 31.5 39.7
Hollie Cavanagh 52.5 2 79.5 13.37
Jessica Sanchez 74.5 1 61.5 6.25

Hollie Cavanagh has been predicted safe for a few weeks now, based on her Dialidol standings, but I’m not buying it tonight. With the lowest rated performances, bad reviews from the judges, and a near historical number of times in the Bottom 3, I cannot imagine she isn’t going home.

Why is the model so bad at figuring this out? With 11 data points (one per season), I don’t know what you can expect. Of course, if there was a systematic way to correct Dialidol’s ranking of Hollie to be in line with reality, I would do it, but you may as well just make up a number. There is no scientific way to do this (but check back with me in season 25, assuming I’m still alive for that).

Historically, Dialidol is quite accurate during the Top 4. However, the service has occasionally shown some blind spots, and this seems to be one of them. The model says Joshua has almost twice the chance of going home as Jessica. That’s nuts. It would be shocking if Hollie was not sent back to Texas/Liverpool tomorrow.

Top 25 post-game assessment and analysis

Projection results

Name WNTS
Approval
Dialidol Probability
of Advancing (%)
Result
Eben Frankewitz 10 44.24 71.1  
Chase Likens 32 20.99 70.2  
Adam Brock 32 19.74 69.8  
Jermaine Jones 51 14.57 68.5  
Joshua Ledet 81 10.9 68.4  
Jessica Sanchez 82 3.08 57.2  
Elise Testone 84 0.12 54.5  
Hollie Cavanagh 76 2.52 54.2  
Skylar Laine 74 2.43 53.3  
Shannon Magrane 62 2.22 47  

The chance of choosing the top 10 correctly from the Top 25 at random was 40%. The model predicted, instead, 70% correctly, but missed badly on the top 3 men. Although the approval ratings would have suggested Eben was not going to be in the Top 10, his Dialidol score was truly humongous. Chase Likens and Adam Brock had nearly identical high Dialidol rankings, quite large, with performance approval that was rather low. None of these 3 made it through.

Before you start knocking me for including Dialidol scores (which I perhaps deserve), note that going just on approval would have made Creighton Fraker and Jeremy Rosado the two men who rounded out the Top 5 guys after Phil Phillips, Joshua Ledet, and Colton Dixon, so it’s not all bad. That would also have incorrectly knocked Jermaine Jones out of the forecast, and would have put Erika Van Pelt into the Top 5 girls, also incorrectly.

Fortunately, the model weighs these factors according to how accurate they’ve been in the past. So the top Dialidol woman (Brielle) was nonetheless still regarded (correctly) as not being in the Top 5 girls, because her WNTS rating was quite low. But no model that takes Dialidol into account is going to be robust to an errant Dialidol rating of 44.2 (for comparison purposes, Heejun registered a 0.24. That indicates that Dialidol thought that Eben did 184 times better than Heejun).

What happened to Dialidol this week? I confess I don’t know. When, last year, they kept predicting Scotty McCreery to sail through, I was incredulous. Sure enough, that proved prescient, as McCreery was a bulletproof contestant, advancing despite bad singing scores, and eventually claiming the title. So, you disregard Dialidol at your own peril. Except this time! I have no idea why Eben and Chase were so favored in the Dialidol sample. The service does more than just measure the busy signal, and actually measures votes from users, and maybe those users are disproportionately … what?! I can’t imagine.

As for Heejun Han, the evidence for his advancement to the finals is simply not there in the numbers. No number except pre-exposure indicated he would get through, and that alone is not usually enough. Indeed, Reed Grimm had a very large pre-exposure time, a nearly identical WNTS rating, and a larger Dialidol ranking, and was not in the Top 5 men. Go figure.

Giving credit where it’s due, the keeper of the Vote For the Worst Twitter feed correctly predicted all 10 of the vote winners. The power of a person going “on feeling” is in some cases very accurate, and this shouldn’t be shocking. If you had asked me whether I agreed that Eben or Chase would advance, I would have answered no. But I’m interested in predictability on a technical level, so I’m not going to juke the results until they match my gut. What would be the intellectual exercise in that?