Thursday, December 26, 2013

Correlation and Causation (It's Kind of a Big Deal)

My sister sent me a link to an article titled "Earn more money, when you have more sex, study says (seriously!)" Being a math teacher herself, she had some doubts about the article's claims. She linked me to the study on which the article was based and passed along her thoughts on its methods:
Is it me, or are the guy's statistical interpretations (from the journal article) not all correct? Sketchy claims include: 

"For both sexes, in Panel I, we observe that a one standard deviation increase in sexual activity increases hourly wages by 3.2%, other things being equal." 

"The importance of the sexual activity variable can also be assessed by the fact that if we regress a single wage equation without the sexual activity variable, the R^2 is 0.821, while if we consider the sexual activity variable (as in Table 5), the R^2 is 0.842. In other words, the wage estimation becomes more precise if we consider the sexual activity variable." 

Um, R^2 always goes up when you add an additional variable, that's why we use adjusted R^2?

Not to mention the implied causation claim. I am not surprised at the correlation since there's a documented positive correlation between being married and having higher income, but really "Earn more money when you have more sex"?
My sister hit the nail on the head with the R-squared thing. To quote the stats textbook I keep in my cubicle for the purpose of making my coworkers think I'm smart, "in practice, the best model found by the R-squared criterion will rarely be the model with the largest R-squared." Adding additional explanatory variables to a regression equation will always increase R-squared, whether or not those variables are truly significant. Adjusted R-squared takes into account the number of explanatory terms already present in the model, and is often a better way to determine which variables are worth retaining in a regression equation-- it helps temper the impulse to cram in as many explanatory variables as possible. I would not interpret an increased R-squared as conclusive evidence that the sexual activity variable is worth retaining in the final regression equation.

The causation claim raised my suspicions as well. It's very difficult, often impossible, to prove causation in an observational study like this one-- there are a million confounding variables out there. "Correlation doesn't prove causation" has become the go-to comeback favored by science fans the world over. It's an easy shot to take at almost any of these attention-grabbing, "studies show" headlines. It's so easy it could feel like a cheat code for sounding smart, if it weren't so often true and applicable.

This excellent needlepoint design is sadly sold out on Etsy.



Dr. Lee, one of my statistics professors in college, liked to show her students data on the average life expectancy in different countries and graph it against the number of televisions per household in each country. She would express surprise at the strongly linear, statistically significant relationship between the two variables, and then implore her students to start a charity foundation to buy televisions for families in poor countries. "Don't we want people to live longer? Let's send them TVs!"

Dr. Lee's example is hardly the only humorous case of unexpectedly strong correlations seeming to imply ridiculous causal relationships. The internet is full of them.

My favorite one so far.
So, the obvious place to start critiquing this "sex causes money!" study is to examine whether its central claim of causation is just a case of misinterpreted correlation. I wanted to give the study's authors the benefit of the doubt-- perhaps they merely identified a strong positive correlation between wages and the frequency of sexual activity, and the laymen over at NBC reported the findings incorrectly? It's not like you can make a great headline out of "New study finds that a large number of variables, including sex and money, are interconnected in ways that merit further research."

But that doesn't seem to have been the case here. Even the title of the study, "The Effect of Sexual Activity on Wages," claims a causal relationship between the variables in question. The study's authors employ "a range of over-identification tests, robustness checks, and falsification tests [to] bolster the case for a causal interpretation of the relation under consideration," namely, that having more sex leads to a larger paycheck. The entire paper seems to be an attempt to suss out a causal relationship between sex and wages, and the authors definitely understand that trends alone won't prove their point. I can't say that their arguments convinced me, but they certainly weren't lazy with their math.

There are still a few glaring issues that undermine the credibility of the study. One fact omitted from the NBC article is that the data for this study was gathered between January and December of 2008... in Greece. But these observations should still be applicable to a wider population, right? I mean, it's not like 2008 was an unusual or remarkable year for the Greek economy or anything.

A sampling of images that come up when I search for "the Greek economy circa 2008."

In addition to sudden economy collapse, there are lots of other things that affect a person's wages. The challenge in this study is how to deal with "unobserved heterogeneity," or how to account for all the various factors that could explain why a person's wages are what they are, to be sure we're not missing something important. To that end, the study includes 31--count 'em! 31!--explanatory variables in their initial regression equation.

The frequency with which the study's participants get freaky was quantified as a ranked categorical variable, and as with the numeric rankings of friendship-attraction discussed in my previous post, an understanding of specifically what each ranking means is important for interpreting the results of the study. Here, sex frequency is a whole-number variable ranging from 0 to 6, with the following descriptions for each level:

0: No sex ever
1: Sex once or twice a year
2: Sex once or twice a month
3: Sex two or three times a month
4: Sex once a week
5: Sex two or three times a week
6: Sex more than four times a week

Not the most evenly-spaced categories, but good enough, I suppose. I wish they included more information about the way they asked the question-- were they asking people for their average rate of sexual activity over a specific period of time, like in the last month, or the last year? Or were they asking for an instantaneous calculation of individual sexual frequency at the present moment, like a sex-derivative? These things can change rapidly. The sample average for this study was about a 4, "[suggesting] that adult individuals approximately have weekly sex." You know how they know? Because it's Wednesday. And Wednesday is the day that I usually share this music video.


Sweet weekly love.

Obviously, not all 31 variables are going to be significantly related to wages, or to one another. After eliminating insignificant variables and accounting for statistically significant factors via stepwise regression, the study concludes that "sexual activity has the lowest positive impact on wage determination [of the retained explanatory variables], but it is still a statistically significant variable." So, yes, it looks like having more sex does positively impact your earnings. It just doesn't impact your wages quite as strongly as having more years of work experience, being older, being straight, being white, or being male. Bad news for young black lesbians. :(

The data present some interesting clusters of correlated variables that bolster my hunch about sex and wages being related, but not necessarily causally linked. Variables showing a positive, significant correlation with sexual activity include being a dude, being married, and being an extrovert. Makes sense, mostly. Variables showing a significant negative correlation with sexual activity include having a chronic illness like diabetes or cancer, having a disability, and being an immigrant. Most of those make sense, but it's sad that immigrants don't have more sex. You move to an unfamiliar country to better your life, you face discrimination and difficulty finding work, and on top of all that, you can't even get laid! Sadness all around.

Now, a lot of the factors that are significantly related to sexual frequency are also significantly related to wages: you make more money if you're a dude, or married, or an extrovert, and you make less money if you're disabled, an immigrant, or someone stricken with cancer or diabetes. That fact alone makes me question whether we can truly point to sex as the cause of increased or decreased wages. If extroverts have more sex and extroverts make more money, my first assumption isn't going to be that the sex is causing the money, or that the money is causing the sex. It feels more reasonable to assume that extroversion contributes to both increased wages and increased sexual activity. Similarly, the relationship between a person's ability to have more sex and their ability to earn more money seems less important than the effect of a handicap on both of those abilities. Lurking variables abound here.

Definitely the best image that popped up when I searched "lurking variable."
The relationship between sex and wages isn't terribly strong, either. Certainly not strong enough to pay for all Carrie Bradshaw's shoes, as the NBC article suggests. From the Greek study: "For both sexes... we observe that a one standard deviation increase in sexual activity increases hourly wages by 3.2%." The standard deviation on the sex variable is about 1, so an increase of one standard deviation means a jump from one category to the next highest one. The average hourly wage is 7.97 euros for the study sample, so a 3.2% increase bumps that up to 8.23 euros per hour. In plain English (and US dollars), this study suggests that if the average person doubles their lovemaking from once a week to twice a week or more, they'll go from earning $10.77 an hour to $11.12 an hour. A 35-cent raise is nothing to sneeze at, but it ain't gonna make you a millionaire.

Ultimately, the study points to sexual activity as a barometer for general wellbeing, with which I agree. I'm not convinced that sex has a direct, causal impact on wages, but I'll concede that it's definitely one of many factors that are positively correlated with one another and with wages. I think i'ts likely that sexual activity is a tangible way to measure underlying mental or social aspects of a person's health, which in turn affect wages--sex as an indicator of general health and wage-earning ability, rather than the cause itself. I have no arguments with the study's final suggestions: "In terms of policy implications, access to effective, broadly-based sexual health education could be an important contributing factor to the health and well-being of people."

So, despite the claims of the NBC headline, it doesn't seem like most people can reap the benefits of a larger paycheck solely by working more sex into their daily schedule. Sex is just one part of a much larger, more complex, interconnected picture, one that's more subtle than "more sex => more money". Unless you're a porn star. Then, yeah. The sex-money relationship is pretty monotonic there. But we can't all be the Greatest Pirate Hunter in the World.

Official favorite film of X-Bar P-Hat?

No comments:

Post a Comment