X-Bar, P-Hat: Texting and Driving and Error and Ethics

Recently, my friend Abhi suggested that I do a "controversial" blog post on texting while driving, specifically about "how dangerous is it actually?" This was in response to a recent American Academy of Pediatrics study that found nearly half of teens reported texting while driving, and that the behavior was very strongly associated with other risky behaviors like drunk driving and infrequent seatbelt use. To quote Abhi's suggestion from Facebook:

I haven't seen the studies, but I'm guessing there's a huge selection bias in incidents reported where texting "led" to an accident, because you condition on searching for texting by seeing the accident. The cases where texting doesn't lead to an accident don't get reported. To take an absurd example to illustrate the point, I bet you'd find a similar correlation for accidents and having previously had a soda. Or Chewing gum. Or *wearing seatbelts*. I'll bet you that in 90% of accidents that occur people were wearing seatbelts *grins*.

I have no doubt that Abhi's intentions were benign, but I found the suggestion unsettling. I responded by saying that I didn't want to encourage texting while driving, and he questioned why I should feel bad about investigating a mathematical claim. Well, it's complicated. I'll try to explain.

First, the results of this study neither prove nor suggest that texting while driving (hereafter abbreviated TWD) causes accidents. They don't prove that TWD causes drunk driving, or causes kids to eschew seatbelts. All this study does is highlight a relationship between TWD and other driving behaviors previously identified as risky. (Really, go skim through the study, it's quite readable and informative, as well as free to the public.)

I say 'relationship' rather than 'correlation' because correlation only exists between two numeric variables, whereas the variables in question are grouped into categories. The writers of the study analyzed the relationship between the dichotomous variables TWD/no TWD and drinking/no drinking, as well as analysis on polychotomous variables related to how many days in the past month teens reported TWD. A few of the trends they found were that the proportion of teens who engage in TWD increases with age, and that a teen who self-reports TWD is more than five times as likely to self-report drinking and driving. Also, those teens who reported more days of TWD over the past month were more likely to report other risky driving behaviors.

It is a mathematical fact that nothing in this study suggests that texting while driving is dangerous. In fact, because TWD is so widespread and positively associated with driving behaviors that are known to be deadly, it is mathematically possible that no causal relationship exists between TWD and deadly car accidents. It could be that TWD is just a habit of naturally dangerous drivers; a symptom rather than the disease itself. I think this is what Abhi was getting at with his suggestion.

However, I believe that to make that claim would be mathematically dubious, and furthermore, I feel that it would be morally wrong.

Mathematics first. In his examples, Abhi focuses on only one kind of proportion: the likelihood that, given a driver has experienced a crash, that driver engaged in some sort of behavior before the crash. He posits that the percentage of people who were texting before they crashed might be similar to the percentage of people wearing seatbelts before they crashed. Thus, if we observe that 90% of crashes involved TWD and that 90% of crashes involved seatbelt use, we would have to conclude that TWD and seatbelts are equally dangerous-- or, more likely, conclude that they are equally unrelated to car crashes. Ha ha, 24-hour news! Your scare-mongering tactics targeting parents of teenagers is mathematically bogus! Way to stick it to the MAN, statistical analysis.

Except that comparing those two percentages is irrelevant to the question of whether TWD (or seatbelt use) is likely to cause a crash. Instead, we should compare the percentage of those who crashed after engaging in the behavior in question with the overall percentage of drivers engaging in that behavior. To use seatbelts as an example, we could set up null and alternative hypotheses similar to these ones:

Using probability terms, we're interested in comparing the likelihood of seatbelt use given death-by-crash with the overall likelihood of seatbelt use.

If we fail to reject our null hypothesis, that means that the proportion of dead bodies wearing seatbelts is similar to (or less than) the proportion of live bodies wearing seatbelts. If we observe that 90% of crashes involved people wearing seatbelts and that 90% of people overall wear seatbelts while driving, we would fail to reject our null hypothesis. Seatbelt use is not disproportionately represented among the population of drivers who die in crashes, given what we know about seatbelt use among drivers overall.

Let's look at TWD through the same lens.

Again, in probability terms, we're interested in the comparison of TWD rates among those who died in crashes versus TWD rates among drivers overall.

Abhi pointed out that "cases where texting doesn't lead to an accident don't get reported," and it's true that finding the probability of TWD and not getting into a crash is pretty complicated. However, the study to which he pointed in his suggestion does give us a decent estimate for the overal proportion of drivers who engage in TWD, at least among teenagers. They found that the proportion is about 46%.

Let's use their percentage in our example. If we found that TWD occurred in 90% of fatal teenage crashes, while 46% of teenage drivers report TWD, then we would probably reject our null hypothesis in favor of our alternative, as long as our sample sizes were large enough to provide statistically significant results. TWD would be overrepresented among corpses given what we know about the overall incidence of TWD.

In this example, TWD and seatbelt use could have the exact same likelihood of having happened before a crash, but one would be of much greater significance than the other.

Of course, these numbers are mostly made-up. I don't know the prevalence of seatbelt use among the general public or among those who die in car crashes, and even the study linked above acknowledges that other studies have found teen TWD percentages as low as 25% and as high as 75%. The numbers on TWD are hard to pin down, and that complicates any analysis of TWD's risks.

Different studies have found circumstantial evidence of a link between TWD and car crashes, and each one has its drawbacks. This one found a relationship between TWD and traffic citations, though it was based on self-reported data and the citations in question weren't necessarily for TWD. This one put teenagers in a driving simulator and found that TWD has a pretty big impact on driver attentiveness to the road, but they only tested 20 drivers, and we technically can't conclude that the demonstrated level of decreased attentiveness is enough to cause fatal crashes. This one is a big meta-study reporting "large and comparable effects on poor driving performance for texting (r = .572) and alcohol use (r = .539)," but again, these are only correlations, not definitive causal relationships. Probably the most interesting study I came across was one that observed commercial truck drivers over long periods of time: the study had the benefit of being able to observe specific driver behavior during crashes or near-crashes, rather than relying on self-reporting. They found that while talking on a cell phone wasn't necessarily related to dangerous driving maneuvers, other phone-related tasks did appear to increase the risk of a crash-- tasks like dialing, reaching, and texting. But even this study doesn't necessarily prove that TWD is dangerous for the average driver: there's a big difference between driving a semi and driving a sedan, and the relationships in a study on truck drivers might not hold true for the broader population.

The only way we could find a real, causal link between TWD and fatal crashes would be to conduct a controlled experiment. We'd send a control group out on the highway with instructions to not text, and send an experimental group on the road with instructions to text their hearts out. After a while, we'd follow up on mortality rates with both groups. If we can be sure that the only difference between the two groups was whether or not they engaged in TWD, we can finally figure out if TWD really does kill people!

Hopefully, the grossly unethical nature of the situation described above will be apparent to all readers. We can't conduct such a study. Anecdotal evidence isn't anywhere near as reliable as experimental evidence, but when the anecdotal evidence seems to suggest "this activity kills people," we absolutely cannot subject humans to that risk. Even if it's small. So, we're stuck trying to piece together the truth about TWD through driving simulators and imperfect observational studies. Mathematically speaking, we can't say that TWD causes car crashes.

Now, as to Abhi's follow-up question on why I should feel bad about offering such objectively true mathematical analysis, I have to delve into the subject of morals. It's not entirely unrelated to statistics. I'm reading a book right now called Naked Statistics, and the author has some very reasonable things to say about stepping back from the complex mathematics of the discipline to look at the real-life consequences of what you're analyzing. The chapter "Common Regression Mistakes" begins with the following paragraph:

Here is one of the most important things to remember when doing research that involves regression analysis: Try not to kill anyone. You can even put a little Post-it note on your computer monitor: "Do not kill people with your research." Because some very smart people have inadvertently violated that rule.

He goes on to describe the preliminary findings of the famous Nurses' Health Study conducted by Harvard Medical School, which suggested that hormone replacement therapy for older women could prevent heart attacks. So lots of women took hormones, until further research suggested that hormone replacement therapy might actually increase the risk of heart disease, as well as cancer and stroke. And that's a well-researched, enormous longitudinal study conducted by a large team of professional statisticians. They made mistakes that cost people their lives. No statistical analysis is perfect.

Me? I'm all right with statistical analysis, but I'm no genius. I'm just an amateur analyst amusing herself in her spare time. I'd say I'm much more likely to come to incorrect conclusions than the folks at Harvard. That's why I prefer to stick to trivial subjects like playing the lottery or analyzing platonic friendships-- even if I'm wrong, the worst that could happen to you if you take my advice is that you waste $20 on lottery tickets or wind up in a weird-feelings friendship. I know I'm going to make mistakes so I avoid analyzing subjects where mistakes could lead to serious trouble.

I know the risk is very small, but I don't want to take the chance that someone might read "there is no mathematical reason to assume that texting while driving causes car crashes" on my blog and then go crash their car while texting. I don't want to be responsible, even only tangentially, for someone's death. It's irresponsible for me to make such an assertion when the consequences of my inaccuracy are so extreme.

Really, the only statistical analysis on TWD that matters is the analysis of what a Type 1 or Type 2 error means in the context of real life. A Type 1 error, also referred to as a false positive, means that we rejected the null hypothesis even though it was true. A Type 2 error, also called a false negative, means that we failed to reject a null hypothesis that was, in fact, false. We can design experiments to reduce the chance of one type of error at the cost of an increased risk of the other; deciding which risk to minimize depends on the context of the problem. In a trial by jury, a false positive (sending an innocent man to jail) is worse than a false negative (failing to convict a guilty man), so our court system is set up to favor the defendant and, hopefully, minimize the risk of false positives. A medical test for HIV, on the other hand, should try to minimize the risk of false negatives, since mistakenly telling a healthy person they have HIV is much less dangerous than telling an HIV-positive person that they're not sick. Follow-up tests will set the record straight for our false-positive individual; our false negative individual is unlikely to get any follow-up tests, and may die as a result.

Here are the errors that can occur when we try to determine whether TWD is dangerous or not.

So. Texting while driving. How dangerous is it, really? Difficult to say.

But the smart money's on the driver who doesn't.

Friday, May 24, 2013

Texting and Driving and Error and Ethics

No comments:

Post a Comment