Thursday, December 26, 2013

Correlation and Causation (It's Kind of a Big Deal)

My sister sent me a link to an article titled "Earn more money, when you have more sex, study says (seriously!)" Being a math teacher herself, she had some doubts about the article's claims. She linked me to the study on which the article was based and passed along her thoughts on its methods:
Is it me, or are the guy's statistical interpretations (from the journal article) not all correct? Sketchy claims include: 

"For both sexes, in Panel I, we observe that a one standard deviation increase in sexual activity increases hourly wages by 3.2%, other things being equal." 

"The importance of the sexual activity variable can also be assessed by the fact that if we regress a single wage equation without the sexual activity variable, the R^2 is 0.821, while if we consider the sexual activity variable (as in Table 5), the R^2 is 0.842. In other words, the wage estimation becomes more precise if we consider the sexual activity variable." 

Um, R^2 always goes up when you add an additional variable, that's why we use adjusted R^2?

Not to mention the implied causation claim. I am not surprised at the correlation since there's a documented positive correlation between being married and having higher income, but really "Earn more money when you have more sex"?
My sister hit the nail on the head with the R-squared thing. To quote the stats textbook I keep in my cubicle for the purpose of making my coworkers think I'm smart, "in practice, the best model found by the R-squared criterion will rarely be the model with the largest R-squared." Adding additional explanatory variables to a regression equation will always increase R-squared, whether or not those variables are truly significant. Adjusted R-squared takes into account the number of explanatory terms already present in the model, and is often a better way to determine which variables are worth retaining in a regression equation-- it helps temper the impulse to cram in as many explanatory variables as possible. I would not interpret an increased R-squared as conclusive evidence that the sexual activity variable is worth retaining in the final regression equation.

The causation claim raised my suspicions as well. It's very difficult, often impossible, to prove causation in an observational study like this one-- there are a million confounding variables out there. "Correlation doesn't prove causation" has become the go-to comeback favored by science fans the world over. It's an easy shot to take at almost any of these attention-grabbing, "studies show" headlines. It's so easy it could feel like a cheat code for sounding smart, if it weren't so often true and applicable.

This excellent needlepoint design is sadly sold out on Etsy.

Friday, September 27, 2013

Homestuck Revisited

I made a mistake.

A huge, HUGE rookie mistake.

Remember this drawing from the Homestuck Post?



It was more correct than I was about the probability of a female-female conversation occurring in a group of size N containing F females.

Friday, August 23, 2013

Was Hot Fuzz a Documentary?

An unconscionably long time ago, Abhi posted a link on my Facebook wall to a study about the comparative dangers of living in the city versus living out in the country, with the instructions "BLOOOOG! GOOOOOO!" I meant to write about it a lot sooner, but, you know. Annual family reunion/beach vacation, so like, nothing productive happening THAT week. Now that my vacation's over and my sunburn's faded enough that I can wear shirts again, it's high time for some bloggin'!



The National Geographic article is split into five "surprising" facts about country living versus city living, "surprising" mostly because they run counter to some conventional wisdom about safety. Following their lead, I'm going to use this study's findings to address three key quotes from the cinematic masterpiece on the relative dangers of small town life, Hot Fuzz.

SANDFORD'S FINEST

Friday, July 19, 2013

The Lottery Reloaded

So my company offers clients mathematical analysis of trends and probabilities. We also have an office lottery pool. Make of that what you will.




Tuesday, July 2, 2013

The Homestuck Bechdel Analysis

Before we begin this post, I feel it prudent to make a confession. It's probably not even a confession, since it's likely that everyone knows it already-- more like an admission of guilt regarding a crime of which I have already been accused, or at least suspected. There is... a web comic. No! Not a web comic, exactly, or at least, not only a web comic. There is a long, multi-arc narrative spanning a wide range of different styles of expression, including traditional web comics, interactive puzzle-solving games, gorgeous animation scored by a multitude of experimental musicians, long-winded chat logs delving into the nature of causality, and equally long-winded chat logs probing the nature of alien reproductive biology. There is a thing called Homestuck, and it is a silly thing that I love dearly.



Friday, May 24, 2013

Texting and Driving and Error and Ethics

Recently, my friend Abhi suggested that I do a "controversial" blog post on texting while driving, specifically about "how dangerous is it actually?" This was in response to a recent American Academy of Pediatrics study that found nearly half of teens reported texting while driving, and that the behavior was very strongly associated with other risky behaviors like drunk driving and infrequent seatbelt use. To quote Abhi's suggestion from Facebook:
I haven't seen the studies, but I'm guessing there's a huge selection bias in incidents reported where texting "led" to an accident, because you condition on searching for texting by seeing the accident. The cases where texting doesn't lead to an accident don't get reported. To take an absurd example to illustrate the point, I bet you'd find a similar correlation for accidents and having previously had a soda. Or Chewing gum. Or *wearing seatbelts*. I'll bet you that in 90% of accidents that occur people were wearing seatbelts *grins*.
I have no doubt that Abhi's intentions were benign, but I found the suggestion unsettling. I responded by saying that I didn't want to encourage texting while driving, and he questioned why I should feel bad about investigating a mathematical claim. Well, it's complicated. I'll try to explain.

Wednesday, May 8, 2013

Alfnak Alfventures

Hey there readers! Quick post about an interesting Skyrim mod I played through recently. You can find "Alfnak Ruin" here, along with the results of the (no longer active) survey that goes with it. It's a pretty straightforward dungeon, nothing spectacular if you're looking for cool Skyrim mods (if you are, that's a TOTALLY different blog post, on probably a totally different blog), but I liked the fact that it was created for the purpose of gathering data. Stats? And Skyrim?! How could I say no?


I predict with 95% confidence that this place is full of SW33T LOOT.

Tuesday, April 30, 2013

Adventures in the Sexy Vacuum: Can We Please Stop Asking Whether Men and Women can be Friends?

Lately, I've been talking with friends, acquaintances and coworkers about the annoyingly persistent question of whether a man and a woman can be "just friends." This all started when YouTube suggested the following video to me. Ten points for spotting the implicit sexism!




If you watch the video, you'll notice several of the factors that contribute to the head-banging, teeth-grinding inanity of this so-called debate. For one thing, the whole question is pretty damn hetero-normative-- pretty sure that the gay trans-men I know don't resent me for not banging them. It's also set up to make women look like idiots. The filmmaker asks leading questions whose answers are unavoidably speculative ("Of those guy friends, do you think any of them secretly like you?") and creates hypothetical situations that change the nature of the relationships in question ("Would Dave hook up with you if you gave him the chance?") Probably the most succinct example of this video's stupidity lies in the conversation that starts at about the 1:50 mark.

Tuesday, April 16, 2013

Statistics of the Hidden Temple

Nineties nostalgia is so in right now around the blogosphere. My generation hit peak television-watching in the early to mid-nineties. Twenty years later, we're hitting our peak analytical-blogging years. Thus, the ultimate result of this alignment of the planets, the timing of our parents' unprotected sex, and the programing decisions of Nickelodeon executives is an ongoing explosion of blog posts about Legends of the Hidden Temple.


The theme music will be stuck your head all day. You're welcome!

They range from in-depth scene-by-scene analysis to candid interviews with past contestants to impassioned rants, but there are a few themes uniting them all.


You know what's coming next.

Friday, April 5, 2013

Residuals: What the hell are they, and why are they important?

Lady luck has blessed me recently with some AP Stats students in need of tutoring, and it's been wonderful getting to tutor in my area of expertise! Not that I dislike helping students with geometry proofs or trigonometric calculations; it's just nice to work with my favorite flavor of mathematics for a change. The perennial "when am I going to use this in real life?" question is a lot easier to answer when it's asked about statistics.

One of the questions that my students have been asking is why they should care about residuals. It's a valid question! High school math classes don't typically have time to cover the reasoning behind their subjects with much depth, so the only thing most high school students know about residuals is "it's bad if the residuals have a pattern." They don't know why it's bad, or what exactly the residuals are, and they're often confused as to why this is the one time in their stats class when they want to see a scatterplot with no correlation at all. Truthfully, I didn't really understand much about the importance of residuals until late in my college career.

Wednesday, March 27, 2013

Hysterical Friend


or, "How an Absurd Late-Night Text Exchange with a Sexist Idiot managed to Ctrl + C the Infinite Loop of my Grieving Process."

(This one's not about stats.)

Tuesday, March 26, 2013

Bird Election!

Everybody knows that Benjamin Franklin wanted the turkey appointed national bird of the United States, but the more majestic bald eagle won out. Of course, if you've ever seen a bald eagle up close, you might wonder about the whole "majesty" thing.

Thankfully, the bald eagle is no longer endangered-- but now there are so many of them, people in western states with large bald eagle populations consider them pests. In Alaska, you're less likely to find a parking lot full of seagulls picking at discarded french fries, and more likely to find a dozen bald eagles hanging out in the cart corral at Wal-Mart sharing a couple goose carcasses. They have a weird, high-pitched squeaky call, less a "CAW!" than a "KEE-Ee-ee-eEE-ee-eee...," like a nervous teenager stuck in perpetual puberty. Their unique brow-shape gives them the appearance of ferocity and strength, unless you catch them at a weird angle, in which case they look to be deeply concerned about the colonoscopy results you just read to them. They mate by crashing into one another mid-flight and fucking frantically as they plummet to the ground. On closer inspection, the bald eagle seems less like the proud symbol of American democracy, and more like the really handsome guy at the party who seems cool until he takes a sip of his drink and accidentally spills it down his shirt.


"I totally meant to do that."
 
So let's elect a new national bird.

Tuesday, March 19, 2013

Battle of the Sexes: Who Tips Better?

Wow, it's been over a month since I posted anything-- sorry, loyal readers (all three of you)! I've been tied up looking for information on the cost of the equipment needed to counterfeit high-quality US currency, the suspected percentage of currently circulating US currency that's likely counterfeit, and the likelihood of getting caught passing counterfeit bills. Surprisingly, the US Department of the Treasury website is less than forthcoming with that information. Also, I think I'm on a couple government watch lists now. Sorry, Abhi. I'll keep working on answering your query about the expected value of a counterfeit dollar.

Lucky for me, my favorite coffee shop has provided an alternate subject to ponder: who tips better, men or women?

 

Friday, February 15, 2013

Does Familiarity breed Contempt?

Thanks to my new job, I have at my disposal piles of data on different radio stations around the country and their listeners' opinions of their songs. Each week, a station sends us 30 - 40 short song hooks, and we set up a survey where people play the hooks, then rate the song. They tell us first whether they're familiar with that song, then how much they like it, and finally whether or not they're sick of it.

Here's some graphs of the data from four different stations who've completed surveys in the last few weeks. In these graphs, each dot represents a particular song, "familiarity" represents the proportion of respondents familiar with the song, and "contempt" represents the proportion of those familiar who said they were sick of hearing it.

I've also included the regression formula, R-squared, and R values on each of these graphs. See, in a linear regression formula, the number in front of the x is the important one: it tells us whether the correlation is positive or negative. The R value tells us how strong the correlation is: when R is close to 0, the relationship is weak; when R is closer to +1 or -1, the relationship is strong. (To be technical, the R value tells you the proportion of the variation in your data that can be explained by the linear relationship rather than random error. The more variation you can explain, the better your model!) The y-intercept is important for positioning the line, but otherwise useless for analysis. It's just telling you the projected contemptibility of a theoretical song with 0% familiarity, which doesn't mean anything. Don't take y-intercepts too seriously in regression analysis. They're just out to confuse you.

Wednesday, February 13, 2013

Gingers, Genes, and the Binomial Function

Happy Birthday, Ansley! For your birthday, I'm going to answer your question! Ansley asks: "I would like to know how probable it is that my soulless kind (aka redheads) are actually going to die out in the current century."

Would you like to help Ansley stave off the extinction of her people?

Wednesday, February 6, 2013

Tuesday, February 5, 2013

Should I play the lottery?



It's easy to assume that those who buy lottery tickets have no understanding of mathematics, or at least no understanding of probability. These idiots are just letting the allure of a giant jackpot overpower their understanding of the odds. Why would anyone buy a ticket if they knew their chances of winning were infinitesimal?

Well, as it turns out, the gut instinct that motivates lottery players-- "Sure, I have almost no chance of winning, but if I do win, I could be a millionaire!"-- has a little mathematical support. One of the most useful formulas for everyday decision-making is the Expected Value Formula, pictured below:


In English, the idea is that the expected value of any random variable X (for example, the amount of money you might win from the lottery) can be calculated by multiplying each possible value of X (you win $5, you win $10, you win $100...) by that possibility's respective probability (there's a 10% chance of winning $5, a 3% chance of winning $10, a 1% chance of winning $100...) and add all those products together. Even if the chance of winning the jackpot is small, a big enough jackpot can outweigh the cost of participation.



Obviously, it's a little complicated to explain in the abstract, so let's use a simple example.

Monday, January 21, 2013

Inaugural Post

Hey, everybody! I've got a new statistics blog, and this is the first of hopefully many (but probably few) fascinating (but probably not) posts about the mysteries of statistics!


I know it's a bad plan to poison the well against myself right from the start, but I feel it prudent to point out that I only have a bachelor's degree in statistics, and I don't consider myself a mathematical genius by any means. My investigations will be therefore somewhat straightforward, pedestrian, and probably won't uncover any truly amazing relationships or startling results-- I'm no Nate Silver, just a curious soul with a graphing calculator and a standard normal table.

 What I need from you, my friendly reader(s?), are ideas for posts! Are there any questions you want answered, that might be answerable with the power of statistics? If you can point me toward where I might find some data, I'll happily scurry off to analyze it for you and get you an answer! Not necessarily a nice answer, and not necessarily a correct answer, but an answer nonetheless.

 Currently, these are the ideas I'm working on:

 - Should I play the NC Education Lottery? I can use the expected value formula to find out! We'll examine what kinds of lottery playing give you the best chances of winning the most money-- or, more likely, how to play the lottery so that you don't lose money (spoiler alert: the secret is to not play the lottery).

 - Exactly how weird is it to be gay? If I can get my hands on the percentage breakdown of the Kinsey gray scale data, I should be able to fit human sexuality to a (totally oversimplified) numeric scale and apply an appropriate probability distribution. It might turn out that colloquially "deviant" behavior actually falls nicely within +/- one standard deviation of the mean!

 - When listening to music on the radio, does familiarity breed contempt? Do fans of different kinds of music get tired of songs at different speeds? (Preliminary analysis suggests that country fans never get tired of anything, ever.) I've just got a new job working for a market research firm testing new music for radio stations, so I've got loads of fun data for this one! I'll just have to scrub it of any identifying characteristics so as to preserve client confidentiality.

 Send me ideas, and I'll try to keep your interest!