Wednesday, July 02, 2014

Pearson vs. Spearman: A Tale of Two Correlations

One of the most rudimentary yet most valuable types of statistics you can calculate for two data sets is their correlation value. Two widely used correlation methods are the Pearson method (which is what we normally think of when we think "correlation coefficient"),  and the Spearman Rank Coefficient method. Which is which? When would you use one rather than the other?

The Pearson method is based on the idea that if Measurement 1 tracks Measurement 2 (whether directly or inversely), you can get some idea of how "linked" they are by calculating Pearson's r (the correlation coefficient), which is a quantity derived from the products of the differences between each M1 and its average and each M2 and its average, duly normalized. The exact formula is here. Rather than talk about the math, I want to talk about the intuitive interpretation. The crucial point is that Pearson's r will be a real value between minus-one and plus-one. Minus-one means the data are negatively correlated, like CEO performance and pay. Okay, that was a lame example. How about age and beauty? No. Wait. That's kind of lame too. How about the mass of a car and its gas mileage? One goes up, the other goes down. That's negative correlation.

The statistical significance of a correlation depends on the magnitude of the correlation and the number of data points used in its computation. To get an idea of how that works, play around with this calculator. Basically, a low correlation value can still be highly significant if there are enough data points. That's the main idea.

Spearman's rank coefficient is similar to Pearson in producing a value from -1 to +1, but you would use Spearman (instead of Pearson) when the rank order of the data are important in some way. Let's consider a couple of examples. A hundred people take a standardized test (like the SAT or GRE), producing 100 English scores and 100 Math scores. You want to know if one is correlated with the other. Pearson's method is the natural choice.

But say you hold a wine-tasting party and you have guests rate ten wines on a decimal scale from zero to ten. You want to know how the judges' scores correlate with the wines' prices. Is the best-tasting wine the most expensive wine? Is the second-best-tasting wine the second-most-expensive? Etc. This is a situation calling for Spearman rather than Pearson. It's perfectly okay to use Pearson here, but you might not be as satisfied with the result. 

In the Spearman test, you would sort the judges' scores to obtain a rank ordering of wines by the "taste test." Then you would calculate scores using the Spearman formula, which values co-rankings rather than covariances per se. Here's the intuitive explanation: Say the most expensive wine costs 200 times more than the cheapest wine. Is it reasonable to expect that the most expensive wine will taste 200 times better than the cheapest wine? Will people score the best wine '10' and the worst wine '0.05'? Probably not. What you're interested in is whether the taste rankings track price in an orderly way. That's what Spearman is designed to find out.

If the taste scores are [10,9.8,8,7.8,7.7,7,6,5,4,2] and the wine prices are [200,44,32,24,22,17,15,12,8,4], the Spearman coefficient (calculator here) will be 1.0, because the taste scores (in rank order) exactly tracked the prices, whereas Pearson's r (calculator here) will be 0.613, because the taste scores didn't vary in magnitude the same way that the prices did. But say the most expensive wine comes in 4th place for taste. In other words, the second array is [44,32,24,200,22,17,15,12,8,4] but the first array is unchanged. Now Spearman gives 0.927 whereas Pearson gives 0.333. The Spearman score achieves statistical significance (p<.001) whereas Pearson does not.

There are plenty of caveats behind each method, which you can and should read up on at Wikipedia or elsewhere. But the main intuition is that if rank order is important for your data, consider Spearman. Otherwise Pearson will probably suffice.