Spearman's Rank Correlation Coefficient

A rank correlation is any of several statistics that measure an ordinal association—the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the ordering labels "first", "second", "third", etc. to different observations of a particular variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them.

We have two random variables \(X\) and \(Y\): \(X=\{x_i, x_2, x_3, ..., x_n\}\) \(Y=\{y_i, y_2, y_3, ..., y_n\}\)

if \(Rank_X\) and \(Rank_Y\) denote the respective ranks of each data point, then the Spearman's rank correlation coefficient, \(r_s\), is the Pearson correlation coefficient of \(Rank_X\) and \(Rank_Y\).

Example

  • \(X=\{0.2, 1.3, 0.2, 1.1, 1.4, 1.5\}\)
  • \(Y=\{1.9, 2.2, 3.1, 1.2, 2.2, 2.2\}\)
$$ Rank_X \quad \begin{bmatrix} X: & 0.2 & 1.3 & 0.2 & 1.1 & 1.4 & 1.5 \\ Rank: & 1 & 3 & 1 & 2 & 4 & 5 \end{bmatrix} \quad $$

so, \(Rank_X = \{1, 3, 1, 2, 4, 5\}\)

similarly, \(Rank_Y=\{2,3,4,1,3,3\}\)

\(r_s\) equals the Pearson correlation coefficient of \(Rank_X\) and \(Rank_Y\), meaning that \(r=0.158114\)

Special case : \(X\) and \(Y\) don't contain duplicates

$$r_s=1-\frac{6\sum d_i^2}{n(n^2-1)}$$

Where, \(d_i\) is the difference between the respective values of \(Rank_X\) and \(Rank_Y\).