How bad are Idol’s ratings?

Is American Idol losing its cultural cachet? The signs around seem to point to yes. What was once something that commanded national news, its own wrap up show, and even a weekly write-up on internet juggernaut Slate, now has none of those things. One scarcely has to try to avoid spoilers, and other shows like The Voice have moved in to compete in a serious way. Recently, Idol stopped winning the night for performances, and its results show is regularly killed by The Big Bang Theory. The ratings are even lower than season 1, which aired during the summer months when less TV is typically watched.

The Nielsen ratings for a show are kind of a murky issue. They measure a percentage of all TV viewers watching a show, out of the 100 million or so total TV viewers. The one most people pay attention to is the ages 18-49 demographic, the most important for advertisers. However, the share of viewers watching broadcast (rather than cable) shows has been falling for many years now. In 1984 fully 45% of all viewers were watching major prime time broadcast networks, as opposed to cable, local, or public broadcasting. By 2002 when Idol premiered it had already fallen to 29.6% and was 25.6% in 2009 (the most recent year published by TVByTheNumbers).

If we look at the ratings for Idol season by season, plotted with the x-axis representing the number of remaining contestants in the finals, we can see how things have progressed

RatingsS1-S12In Season 1 the viewers caught on around the Top 5, and Idol jumped a huge amount (points are omitted where data was unavailable). Season 5, with Taylor Hicks and Katharine McPhee, was clearly the biggest year by any account. That’s a bit strange to me, since I always thought Season 7 (David Cook) was really the zenith in terms of cultural awareness of the show.

Season 12 is the lowest rated season by a comfortable margin. But how bad is it when you consider the falling overall audience? To answer this, we could weight the ratings data proportional to the remaining audience. The data is provided by TVByTheNumbers in the above link only through 2009, so we have to do a bit of guesswork. We can fairly reliably think that the overall audience is falling about 1 point per year, which has been the case since 2000 (with the exception of 2005, which had a rise probably due to DVR ratings being added).

Viewed through this lens, Season 12 is … still pretty bad
RatingsWeighted

Season 12 at least starts a little better than Season 1, but after that plummets to last place. Seasons 4 and 5 look pretty similar near the end, accounting for audience size. Season 8 gets a boost relative to Season 1. Season 10 was a bit of an outlier, an unusually strong year that got big viewership relative to that year’s total (absent the horrible finale that year). Other than that, the rankings don’t change much. The viewership, even as a share of remaining broadcast viewers, is falling pretty fast.

One caveat here is that this year’s broadcast ratings appear to have fallen further than previous years, with Fox alone being about 19% lower than last year. Could that account for the low ratings? No, even if you include another point of loss, Season 12 is still the nadir. Idol is sinking fast, and the producers have reason to worry.

Song choice ratings: safest songs

In the first stab at analyzing song choice, I looked at artists who are commonly sung on Idol, and how safe (historically) the contestants who sang those songs had been. However, in that case I was only focusing on the artist, not the songs, so long as the artist had more than one of their songs sung. Today, I will focus on a rating for songs that are commonly sung on Idol, regardless of who they were by.

1016 songs have been sung on Idol, with 677 of those having been sung only once. There’s not much I can say about those, since one data point doesn’t even make the slightest of trends. So, focusing in on those 349 other songs, I want to look at how safe they appear to be and rank them in a manner similar to the artists. This means looking at how many times they were sung, how often the contestants were safe, how many had their standing improved by such a song (in that they were people who had previously been in the bottom 3). I incorporate all performances except for reprises (songs repeated for the finale) or American Idol official songs.

More after the jump

Continue reading

Who not to sing

How smart is it to sing a Luther Vandross song on Idol? What about a Tina Turner song?

What you might say is that there’s no real answer to this. If you sing a great rendition of a song by one of those artists, you’re likely to be safe, right? I would mainly agree. But the artists whose songs are covered on Idol vary in popularity, and their songs lend themselves better or worse to the contest than other artists, surely. And it ought to be at least somewhat quantifiable.

Singing a Shirley Bassey song (I Who Have Nothing, As Long as He Needs Me) is pretty safe. 7 people have done so, and none of them was ever put in the bottom 3 by doing it. Singing an Adele song is pretty unsafe. 7 people have done so, and only 1 was ever safe.

So, accepting the premise that the success of a song choice is somewhat dependent on artist, here I’m going to discuss a way of rating these artists. It’s imperfect and somewhat subjective, but it is at least consistent, and gives a convenient way to rate song choice. The result is the composite index calculated below.

More after the jump

Continue reading

The Top 3 tracker

(Note: The first Top 3 tracker for season 12 should be up later today)

A new feature this year is the Top 3 tracker. Here I give the relative probabilities that the contestants will make the Top 3 given their standing at present.

The T3T works by using the projected IdolAnalytics week-to-week not-safe probability as an explanatory variable. The following chart shows the assigned not-safe probability for first finals voting round (called Round 2 by the model) for seasons 5-11. The points represent outcomes from previous seasons. The x-axis denotes the probability of being in the bottom 3 in that round assigned by the model. The y-axis represents whether or not they eventually made it to the Top 3, with 1 being “yes” and 0 being “no” (a small amount of vertical jitter has been applied so that overlapping points can be seen).

Round 2 not-safe probability vs chance of making the Top 3. This graph is interactive; hover your cursor over a point to see who the contestant was.

The blue curve is a regression line giving the estimated probability of making the Top 3 for any given assigned round 2 not-safe probability. For example, if this week the contestant had a 0.25 chance of being not-safe, his chance of making the Top 3 is only about 20%. But if he had a 0.1 chance of being in the bottom 3, his chances of making the Top 3 are about 50%. Hover your mouse cursor over the points to see which contestants they belong to.

You can see, roughly speaking, the “yes” (1) points are clustered to the left, and the “no” (0) points are more dense toward the right. Logically, a high chance of being in the bottom 3 in the Top 12 or Top 13 round means you are less likely to make it to the Top 3.

The model is still fairly uncertain during the Top 12. However, anybody with a not-safe probability of > 0.27 is unlikely to make the Top 3. It’s happened only twice in the time-frame we model (Syesha Mercado and Haley Reinhart). However, having a low not-safe probability at this point is not a guarantee of making the Top 3 (see, for instance Chris Daughtry and Siobahn Magnus).

If we look later in the contest, the model has to a certain extent converged:

Round 5 not-safe probability vs chance of making the Top 3. This graph is interactive; hover your cursor over a point to see who the contestant was.

In round 5 (normally the Top 8), the not-safe probabilities of people who did not make the Top 3 are significantly higher on average than those who did make the Top 3 (note that the probabilities used here are the average of round 4 and round 5). Nobody with a not-safe probability of less than 0.2 has ever not made the Top 3, and nobody with a score over about 0.43 ever has. Rarely has someone with > 0.3 made it to the Top 3. The region from 0.27 to 0.37 represents people who are “on the bubble”. They could go either way, and it depends on whether they can turn things around.

Because the data begins to get sparse in later rounds, an averaging mechanism has been employed. This reduces some of the week-to-week noise inherent in the model and returns a better overall fit to the observations. Rounds 5-7 use a two-week average, and 8-9 use a three-week average. This methodology is consistent with a belief that the contest largely hardens as the weeks go by, as people have chosen their favorite by then.

Why only a Top 3 tracker, rather than a straight up winner tracker? It turns out that it’s pretty hard to tell what’s going to happen in the Top 3 before it actually happens. Some races have turned on a dime as one contestant picks up a lot of new voters from the eliminated contestant. This kind of coalition building has probably happened in a few years, most notably with Kris Allen and Lee DeWyze. Thus, the fact that the tracker only attempts to ascertain who will be in the Top 3 represents the amount of uncertainty there still is.

As with the week-to-week forecast, this represents kind of conventional wisdom. Early rounds the model considered Pia Toscano to be a lock for Top 3, which is what many people thought, and it was wrong. It also thought Syesha Mercado was not going to make it, like most people did, and was wrong. However, it is right much more often than wrong, which is all one can hope for. The model isn’t designed, nor could I design it, to be capable of seeing difficult-to-foresee events. All it can do is make a reasonable inference based on previous years, and in that is seems to largely succeed. Once a precedent happens, the model never forgets that, but it also doesn’t go nuts when a weird result happens.

A new model of American Idol

When it comes to actual predictions of American Idol outcomes week-to-week, I’ve always been interested but skeptical. Idol is a shifting and surprising thing. As a result, I did something of a reasonable but overly cautious nature, which is to compare a couple variables to outcomes of comparable shows. For instance, the Top 8 could be directly compared to all other top 8 rounds. Then, using a simple logistical regression, I let R dictate the coefficients of regression and left it at that. It was a toy model, not claimed to be worth much.

The old model ignored all of the personal observations I’ve made over the years about how the voting plays out, and was therefore not representative of the show as I see it.┬áThe data for the comparable rounds gets so sparse that it’s hard to draw much of a conclusion. And as a result, it wasn’t very good. But this year, I expended a lot of effort and a lot of time thinking about the problem in more detail.

What I recognized is that a fairly good statistical model could be built if I used some reasonable assumptions, sound observations, and careful methodology. The result of this I will outline in this post. This model is not necessarily predictive of the next year, but it is a quite good description of the years past, is not overfit, and hence stands a good chance of being an accurate description for the next. I’ve well characterized the confidence and ensured that predicted probabilities work out reasonably well to the actual observations.

Hit the jump for the details of the new IdolAnalytics forecast model. This is going to get decently technical, but that shouldn’t put you off. Continue reading