Friday, February 15, 2013

Does Familiarity breed Contempt?

Thanks to my new job, I have at my disposal piles of data on different radio stations around the country and their listeners' opinions of their songs. Each week, a station sends us 30 - 40 short song hooks, and we set up a survey where people play the hooks, then rate the song. They tell us first whether they're familiar with that song, then how much they like it, and finally whether or not they're sick of it.

Here's some graphs of the data from four different stations who've completed surveys in the last few weeks. In these graphs, each dot represents a particular song, "familiarity" represents the proportion of respondents familiar with the song, and "contempt" represents the proportion of those familiar who said they were sick of hearing it.

I've also included the regression formula, R-squared, and R values on each of these graphs. See, in a linear regression formula, the number in front of the x is the important one: it tells us whether the correlation is positive or negative. The R value tells us how strong the correlation is: when R is close to 0, the relationship is weak; when R is closer to +1 or -1, the relationship is strong. (To be technical, the R value tells you the proportion of the variation in your data that can be explained by the linear relationship rather than random error. The more variation you can explain, the better your model!) The y-intercept is important for positioning the line, but otherwise useless for analysis. It's just telling you the projected contemptibility of a theoretical song with 0% familiarity, which doesn't mean anything. Don't take y-intercepts too seriously in regression analysis. They're just out to confuse you.


First up, one of our rock stations!

 
As you can see, there is a positive relationship between familiarity and contempt for rock stations, with a decently strong correlation! Just what you'd expect from the "I knew that band before they got all popular and turned into sellouts" crowd.
 
Let's look at data from one of our rap stations next!
 
 
Relationship's still positive, if very slightly less strong. The rap songs also cover a much wider range in familiarity-- actually, that's because the guy who runs this particular station keeps testing songs that he hasn't ever played, and it's a constant annoyance for us when running his reports, but that's neither here nor there. Safe to say that rap fans, like rock fans, tire of songs they hear too often.
 
What about country fans?
 
 
Oh, bless their tolerant hearts. First time we've seen any songs with absolutely 0% contemptibility among their fans. Oh, and that lone data point out there at about 45% familiarity, but an incongruous 11% contempt? It's "Creepin'" by Eric Church, if you were wondering what kind of song rustles the jimmies of even the most laid-back country fans.
 
We also test several stations in Peru, because, why not? And Peruvian radio listeners exhibit similar song-rating behavior to country fans:
 
 
Sorry, I can't remember what that upper-right-hand point is off the top of my head. Also, check out the range on their familiarity! Peruvians know their top-40 hits.
 
Speaking of range, I sputtered profanity at Excel for a long time and managed to make something like a comparative box-plot analysis of all four stations. (Seriously, Excel, how the fuck do you have "donut" graphs and fucking starfish-lookin' graphs but you can't make a goddamn box-plot? AND DON'T GET ME STARTED ON HISTOGRAMS)
 
 
Yeah, the guy at that rap station just insists on testing songs that haven't been out very long, and haven't had time to gain widespread familiarity. So his station has the widest range of familiarity with various songs. Peru, on the other hand, has no songs under 75% familiarity, and the majority of the data falls above the 90% familiarity mark.
 
Let's compare ranges on contempt:
 
 
These ranges reflect the fact that not only is the relationship between familiarity and contempt weaker among country listeners and Peruvians than among rap and rock fans, those country fans and Peruvians just plain have less contempt than other folks! Also, the tall necks on all four box plots indicate that there are just a few songs for each station that really irk people, while the majority of a station's songs fall in the lower ranges for contemptibility.
 
Short post this time, guys. In conclusion: no matter what your disgruntled elementary school math teacher told you, box plots ARE a useful way to display certain kinds of information. Even if Excel is a douche about making them.

No comments:

Post a Comment