X-Bar, P-Hat

Friday, March 27, 2015

Functions of the Heart

I think I might have solved my love life with math. Maybe. Or it could be residual disorientation from the daylight savings time change. Hard to tell.

Friendship is Statistic

Happy almost-Valentine's Day! I turned Facebook into charts because I have a nonstandard definition of "fun."

The colorful word blizzard above came from Wolfram Alpha's Facebook report feature, which will take the personal data you've scattered to the digital winds and assemble it in concise, structured forms, so you can see what you look like to an advertisement-tailoring algorithm.

I can't find the link to it now (read: I am super lazy), but I was reading a report about the large number of white people who have mostly (or exclusively) white friends. The number was around 90%, I think-- that is, the friend-circles of white people are made up of 90% other white people, among people living in the United States. The percentages weren't as high for members of other races (e.g., black people tend to have more white friends than white people have black friends, and so on). I wondered how I stack up.

Conveniently, I don't have a ton of Facebook friends, so it's not a difficult task for me to put them all in a spreadsheet and mark down everybody's race (as best I can). Turns out... damn. My friends are pretty dang white.

I have 177 Facebook friends*, of whom 156 (88%) are white, 8 (4.5%) are east Asian, 6 (3%) are south Asian, 4 (2%) are Hispanic, and 3 (1.5%) are black. These numbers don't reflect the racial makeup of the place I live: my wider metropolitan area is only about 57% white, and 29% of the population is black. While I am not FB friends with everyone I know in real life, I'm using my FB friend list as a reasonably representative sample of my wider social circle, for the purposes of this investigation-- the results of which don't reflect too well on me.

It's obviously weird to befriend somebody for the sole purpose of checking off a box on some "at least X friends of every race" list, but that doesn't mean that spending the overwhelming majority of my time around white people is a good thing. I feel like one of the best ways to stamp out your own unconscious biases is to surround yourself with people from races and backgrounds different from yours, as much as you can. The flood of information coming in every day from real-life interactions will hopefully overwhelm your internalized stereotypes, and re-wire your brain to accept lots of different people as normal without instinctively distancing yourself from the "other." I am not doing a very good job of that right now.

To get a closer look at the people I interact with regularly, I winnowed down my list of friends to a subset I nicknamed "face" friends: a list of 63 people whose faces I had physically seen within the past year. Here's how they break down:

Even whiter than before. 92% White, 3% Black, 3% South Asian, and 2% Hispanic. In retrospect, it makes sense: I went to a very diverse high school, but I haven't seen many of my high school friends in the past year. The college I went to had a student body white enough to have once been on the KKK's list of recommended universities (an honor of which the school was none too proud), so the folks from whom I could choose my college friends were mostly white. And until very recently, everybody on the small staff of my office was white. The general trend of my life has been to spend less and less time around people who don't look like me. Not a great trend : (

Counting up my number of friends, I wondered, do I have enough friends? Too many? How popular am I, compared to my friends? Excel does not like histograms, but if you arrange your data just so, you can trick it into making a bar chart that acts like a histogram. Here's what the distribution of popularity looks like, among my friends:

Number of friends is understandably skewed right: there's a hard minimum, but no real maximum to speak of. Among my friends, the average number of friends is 508. I'm waaaaay below that, though not technically a full standard deviation below the mean. Then again, a couple very popular friends of mine are pulling that average up, what with all their theatre- and activism-related networking and general likeability (those scoundrels). For a data set this skewed, the median is probably a better estimate of central tendency. The smallest number of friends any of my friends has is 10 (my uncle Jim), the first quartile goes up to 202 friends, the median is at 391 friends, the third quartile is 650 friends, and the absolute maximum is 2,661 friends (holy cow, Carlie, how can you even remember so many names?). Using my friends as a sample set, I have a pretty low number of friends-- I'm only in the bottom 25%.

What I don't understand is how my dad is so much more popular on FB than I am. I'm a dang millennial, but he can post a single photo that attracts more likes than I have friends. Aren't dads not supposed to know how to do the Face Book? I guess he is just a super cool DJ with lots of super cool rockin' friends, and I am a lowly mathematician, whose best friends are all imaginary numbers. I am such a disappointment.

(Just kidding, Dad, I know you love me. And even if you were disappointed that I did not turn out as cool as you did, I would understand.)

* Excluding my dear friends Marius Pontmercy and Billy Budd, who were just obscure enough to have survived the fictional-character-FB-page purge that spoiled all our fun back in high school.

Friday, October 31, 2014

Infrequently Asked Questions

Not exactly a stats post. Just a thinkin' about stuff and writin' about stuff post. These are some questions I've either been asked at some point, or that I ask myself every night as I fall asleep, fearful that the darkness has no answers, more fearful that it will speak answers I cannot bear to hear. Think of it as a Frequently Asked Questions post, using a liberal interpretation of the terms "frequently" and "asked."

Ten Card Draw

Sort of a sequel-post here, with more of the tarot math I was playing around with last time. Some of the references I used for that post discussed the idea that if you properly shuffle a deck of 52 playing cards, there's a decent chance that you've created a unique order of cards, never before seen in the history of shuffled decks. I wondered, what are the chances that any given tarot reading is unique in the history of tarot readings?

The spread pictured above is the Celtic cross spread, one of the more popular tarot spreads out there. Different people have different preferences for what each position signifies-- hell, there are only about three matches between the diagram above and the layout as I learned it-- but it's always ten cards, and the meaning of the spread depends upon the order the cards are drawn as well as the orientation in which they're laid down.

Given those parameters, we can calculate the total number of unique Celtic cross spreads:

78 x 2 x 77 x 2 x 76 x 2 x 75 x 2 x 74 x 2 x 73 x 2 x 72 x 2 x 71 x 2 x 70 x 2 x 69 x 2 =

4,675,765,217,094,107,136,000

The point of all those 2s in there is to represent that each time a card is laid down, there are two possibilities for the way it faces. As you can see, there are a buttload of possible outcomes for a Celtic cross spread. Using standard US nomenclature for really big numbers, we can say that there are more than 4.6 sextillion possible Celtic cross spreads.

... the major motivation behind this blog post is that I calculated a number that gives me an excuse to say the word "sextillion." It is, scientifically speaking, the funniest number-related word.


A Saga-themed tarot deck would be AMAZING, btw.

Now we come to the second part of the question: what's the chance that, given how many possible Celtic cross spreads there are, no two Celtic cross spreads in the history of tarot readings have been identical? To answer this, we can use the math of the Birthday Paradox. If you haven't heard of it, the Birthday Paradox is the name given to the fact that you don't need as large a group of people as you might think before you start getting a pretty good chance that at least two people in that group share a birthday. If you have 23 people in a room together, there's about a fifty-fifty chance that there's a shared birthday among them. The linked explanation of the Birthday Paradox is better than any I could give, so I'll just let y'all educate yourselves there if you want to know more about it.

What's useful for our purposes is the shortcut formula near the end of that post: if you've got a pool of a given number of things, and you have an equal chance of drawing any one of the things, how many times do you need to draw from that pool before your chances of having drawn the same thing twice are about fifty-fifty? The precise math is complex, but we can get a decent estimate by taking the square root of the size of our pool-- or, a little more specifically, 1.177 times the square root of our pool. There are 365 possible birthdays out there (excluding leap years), and 1.177 times the square root of 365 is about 22.49, very close to the actual 23-person figure.

We have a pool of Celtic cross configurations of a known (if enormous) size. Roughly how many tarot readings would need to occur to reach an even chance of at least one repeat?

The answer: You'd need more than 80 billion Celtic cross tarot readings before the chances of a repeat reading reach fifty percent. Specifically, you'd need to do 80,482,750,652 tarot readings, or 11.3 for every human alive on Earth today. That is a big number.

I'm not exactly sure how to determine the total number of tarot readings that have ever occurred. If the average person has had a dozen Celtic cross readings in their lifetime, then we've probably reached 80 billion tarot readings total. But have they? I imagine the variance for that dataset is pretty big-- lots and lots of people who've never had a reading, versus enthusiasts who might have had hundreds. There's probably a way to estimate that, but it would take more effort than I'm willing to put in.

In any case, if every person on earth sat down and did ten consecutive Celtic cross spreads right now, there's a better-than-average chance that every single one of those readings would be unique. That's a staggering enough thought that I'm willing to say there's a good chance that your ten-card Celtic cross reading, while still bullshit, is your very own, never-before-seen, personal bullshit. And isn't that something?

Wednesday, September 24, 2014

Four of Wands, Page of Math

Like many people, I have an embarrassing hobby.

So I'm a statistician, right? A general fan of science and facts and stuff. A stickler for evidence. The kind of person whose mantra is correlation does not necessarily imply causation, except when the causal relationship in question isn't based on quantitative data and therefore can't really be described using the term "correlation," in which case the dubious argument is better criticized by referencing the logical fallacy "post hoc ergo propter hoc." (Because I'm also the kind of person who is a pain-in-the-ass stickler for using the word "correlation" correctly.)

So, it follows that I wouldn't be into any sort of mystical divination practices, since their fakeness is fairly obvious. Even my homegirl Hermione rolls her eyes at them. She trusts the judgment of a telepathic singing hat, but not someone's subjective interpretation of a pile of tea leaves, because even literal witches know that divination is bullshit.

That is my embarrassing hobby: I read tarot cards. I don't believe they have any predictive power, but I know how to do the spreads and interpret them as though they do. The only excuses I have for why I read tarot cards are

You can convince drunk people that you're magic
They're pretty
It's kind of like Rorschach blots, you know? The meaning that you impose on the ambiguity can highlight thoughts, feelings, and motivations that might otherwise be difficult to identify
They're really pretty
Did I mention how pretty they are

they are really very pretty

My friend Lauren wants to go back to school to finish up her degree, and a while back, she asked if I could read her tarot cards with regard to her prospects. We shuffled and re-shuffled (a LOT) and I drew three cards for her: a reversed ace of wands (stagnation or lack of passion for a new opportunity), a nine of swords (mounting anxiety and insomnia), and a reversed five of swords (unavoidable catastrophic failure). Not even the most liberal interpretation could spin anything positive out of that hand.

So I blew it off and shuffled again, because Lauren is awesome and deserves a fortune-telling session that predicts piles of cash and hookers (lookin' at you, ten of pentacles). And the new cards were a reversed high priestess (impatience, wasted potential), a five of pentacles (poverty and bad luck), and the goddamned devil (which is pretty much as bad as it sounds).

This ridiculous bullshit continued for about seven more drawings, separated by increasingly intense shuffling sessions, including the foolproof throw-all-the-cards-in-the-air-and-swear-loudly method. Time and again, Lauren got cards that weren't just irrelevant or neutral, but seemed to be actively shitting on her dreams. More than once, she shouted at me, "you're a statistician-- what are the odds of this?"

Challenge accepted!

At first I thought it would be pretty easy to answer Lauren's question. How many ways can there be to draw three cards from a deck? While a card's position in its spread usually affects its interpretation, we weren't applying any past-present-future or problem-advice-outcome meanings to the three card spreads we were doing, so order doesn't matter for our purposes. There's 78 cards in a tarot deck, so there's 76,076 possible hands of three.

Now we just have to determine how many of those spreads would be unfavorable for Lauren's education! This part turned out to be trickier.

Since orientation affects interpretation, each of the 78 cards has 2 possible outcomes, for 156 total. I categorized each of these 156 outcomes as positive, negative, or irrelevant with regard to Lauren's education, because ambiguity is for chumps. Of those, there are 43 good possibilities (28%), 64 bad possibilities (41%), and 49 possibilities (31%) that have no bearing on Lauren's question (get outta here two of cups, we weren't asking about crushes). I defined an unfavorable spread as any combination of three cards that includes no positive results and at least one negative result: either all negatives, two negatives and an irrelevant, or one negative and two irrelevant.

And that's where I ran into a bit of a problem. See, I planned to calculate all the different ways one could draw any of those three hands using plain old n-choose-k, where I'd be looking for how many ways to choose 3 negative cards from the pool of 64... but there aren't exactly 64 negative cards. It's impossible for me to first draw an upright eight of swords (feeling trapped by circumstance) and then draw a reversed eight of swords (feeling trapped by circumstance, but like, even worse). But I've included both the upright and reversed versions of the card in my tally of negative outcomes. Essentially, I've backed myself into a mathematical corner where I've artificially doubled the size of the deck: I'm doing calculations based on 156 outcomes, rather than 78 outcomes with two variations each.

To be truly rigorous, I should tally up how many cards switch from positive to negative, from irrelevant to positive, from negative to irrelevant, etc, versus how many cards maintain their general meaning when reversed, and work those numbers into much longer calculations of conditional probability.

Yeah, I based my calculations on the imaginary 156-card deck, where each orientation of each card counts as its own separate card. It's fortune-telling, for crying out loud, I'm not going to worry too much about rigor.

If we allow the fudge-factor of a theoretical 156-card deck to deal with the problem of card orientation, there are 620,620 possible combinations of three cards that one can draw from a tarot deck. There are 41,664 ways to get three negative cards, 98,784 ways to get two negative cards and one irrelevant card, and 75,264 ways to get two irrelevant cards and one negative card.

All together, that's 215,712 crappy hands, out of 620,620 possible hands. There's about a 35% chance that any individual tarot reading I do for Lauren's educational prospects will be negative. If we multiply things out to reflect the fact that Lauren got eight successive crappy readings, we get a probability of just over 0.02%. So... huh. Dang Lauren, sure looks like the universe has it in for you.

Technically, the universe has it in for all of us. Enjoy your inevitable disintegration, everything! <3 Entropy

Friday, March 28, 2014

Public Radio Pledge Drive Probability

North Carolina Public Radio Pledge Drive Season is the worst time to miss a call from an unfamiliar number. I have missed several such calls in the past few weeks, and I am convinced that each one was Eric Hodge. He was calling to inform me, via vague, leading questions about famous landmarks and culturally-specific foods, that I'd won a drawing for one of the fantastic getaways they always advertise. One of these days, I'll be right. But which one of these days?

It's a little tough to figure out how likely I am to win any particular WUNC Trip Drawing. My probability of winning is always nonzero, since I'm a Sustainer, which means I'm a) better than everyone else and b) automatically entered into every Pledge Drive drawing. The amount of funding that North Carolina Public Radio receives from listener contributions is public knowledge, but the precise number of contributors isn't listed anywhere that I could find. Even if I could figure out how many Sustainers there are, the number of one-time-gifters in the pool changes with every drawing.

There are a couple ways I can estimate how many people listen to WUNC, which is a good start for figuring out how many people are competing with me for that weekend getaway in France. This Nielson report from December of last year indicates that 90% of Americans listen to radio each week, and the Raleigh-Durham region has a population of about two million, so the local radio-listening audience is probably somewhere around 1,800,000. WUNC's weekly cume is a little under 17% of the market, so there are probably at least 300,000 people tuning in each week. Maybe. Man, I hope I'm estimating radio listenership the right way, since it's sort of my job that I get paid for!

So, let's be optimistic and assume that maybe one out of every twenty listeners donates to the station. I've got no clue if that estimate is wildly high or wildly low or wildly spot-on, never having worked for an organization that depends on donations, but it feels like a vaguely educated guess. That's 15,000 donors total. It probably fluctuates, but let's say, for argument's sake, that I have a 1 in 15,000 chance in any given drawing of winning a trip to Rome. There are three WUNC Pledge Drives each year, and each one seems to have at least five trip drawings, so I have fifteen chances each year to win.

How long would I have to wait before I could be at least 90% sure of winning at least one trip drawing? Put another way, how many times do I have to enter a drawing so that my chance of losing every single last of them dwindles to 10% or less? My chance of losing any individual drawing is 14999/15000, and my chance of losing all the drawings I enter is 14999/15000 raised to the power of the number of drawings in which I participate. So let's solve this equation!

The sooner I get LaTeX on my new computer, the better

Using the definition of a logarithm and the change of base formula, both of which I totally remembered and did not need to look up just now, because I am a smart mathematician who never ever forgets really basic important facts, and being chronically rusty on logarithms definitely isn't a significant source of anxiety for me, we conclude that the number of drawings I'd need to enter in order to have a 90% chance of winning at least one of them is... 34,538 drawings, or over 2,300 years worth of pledge drives. Lucky for me, my loyalty to quality radio programming is as undying as the sun. Which means, strictly speaking, not actually undying. But undying enough to last for 2,300 years.

Sunday, March 23, 2014

This one isn't about statistics, but is instead about my mom, and her death

I feel very sad, very often, and very little remedies the sadness.

I've started writing this thing so many times, and discarded so many drafts. If I took all the words I've written and deleted since December 7th 2013 and put them together, I'd win NaNoWriMo. Sometimes what I wrote was eloquent, but most of the time it was impenetrable garbage that rambled on way past the point where it stopped making sense. Sometimes what I wrote was angry, and mostly it was angry at people who had nothing to do with what I was angry about. Sometimes it had a lot of science in it. Sometimes it used a lot of metaphors. Sometimes it included a lot of fandom references. Sometimes it had pictures, and sometimes they were pictures that I drew (poorly). Sometimes I was bitter. Sometimes I wrote while I was sober, sometimes I wrote while I was drunk, and sometimes I wrote while I was crying and couldn't stop.

Mostly, all the things I wrote were just different ways of saying the same thing: My mom died. I feel very sad, very often, and there are very few things that remedy the sadness. Writing isn't one of them.