info prev up next book cdrom email home

Correlation (Statistical)

For two variables $x$ and $y$,

\begin{displaymath}
\mathop{\rm cor}\nolimits (x,y) \equiv {{\rm cov}(x,y)\over \sigma_x\sigma_y},
\end{displaymath} (1)

where $\sigma_x$ denotes Standard Deviation and $\mathop{\rm cov}\nolimits (x,y)$ is the Covariance of these two variables. For the general case of variables $x_i$ and $x_j$, where $i,j=1$, 2, ..., $n$,
\begin{displaymath}
\mathop{\rm cor}\nolimits (x_i,x_j) = {{\rm cov}(x_i,x_j)\over \sqrt{V_{ii}V_{jj}}},
\end{displaymath} (2)

where $V_{ii}$ are elements of the Covariance Matrix. In general, a correlation gives the strength of the relationship between variables. The variance of any quantity is alway Nonnegative by definition, so
\begin{displaymath}
\mathop{\rm var}\nolimits \left({{x\over\sigma_x} + {y\over\sigma_y}}\right)\geq 0.
\end{displaymath} (3)

From a property of Variances, the sum can be expanded
\begin{displaymath}
\mathop{\rm var}\nolimits \left({x\over\sigma_x}\right)+\mat...
...imits \left({{x\over\sigma_x} , {y\over\sigma_y}}\right)\geq 0
\end{displaymath} (4)


\begin{displaymath}
{1\over{\sigma_x}^2} \mathop{\rm var}\nolimits (x)+{1\over{\...
...{2\over\sigma_x\sigma_y}\mathop{\rm cov}\nolimits (x,y) \geq 0
\end{displaymath} (5)


\begin{displaymath}
1 + 1 + {2\over\sigma_x\sigma_y}\mathop{\rm cov}\nolimits (x...
...\over\sigma_x\sigma_y}
\mathop{\rm cov}\nolimits (x,y) \geq 0.
\end{displaymath} (6)

Therefore,
\begin{displaymath}
\mathop{\rm cor}\nolimits (x,y) = {\mathop{\rm cov}\nolimits (x,y)\over\sigma_x\sigma_y} \geq -1.
\end{displaymath} (7)

Similarly,
\begin{displaymath}
\mathop{\rm var}\nolimits \left({x\over\sigma_x}\right)- \left({y\over\sigma_y}\right)\geq 0
\end{displaymath} (8)


\begin{displaymath}
\mathop{\rm var}\nolimits \left({x\over\sigma_x}\right)+\mat...
...mits \left({{x\over\sigma_x} , -{y\over\sigma_y}}\right)\geq 0
\end{displaymath} (9)


\begin{displaymath}
{1\over{\sigma_x}^2} \mathop{\rm var}\nolimits (x) + {1\over...
...2\over\sigma_x\sigma_y} \mathop{\rm cov}\nolimits (x,y) \geq 0
\end{displaymath} (10)


\begin{displaymath}
1 + 1 - {2\over\sigma_x\sigma_y} \mathop{\rm cov}\nolimits (...
...\over\sigma_x\sigma_y}
\mathop{\rm cov}\nolimits (x,y) \geq 0.
\end{displaymath} (11)

Therefore,
\begin{displaymath}
\mathop{\rm cor}\nolimits (x,y) = {\mathop{\rm cov}\nolimits (x,y)\over\sigma_x\sigma_y} \leq 1,
\end{displaymath} (12)

so $-1 \leq \mathop{\rm cor}\nolimits (x,y) \leq 1$. For a linear combination of two variables,
$\displaystyle \mathop{\rm var}\nolimits (y-bx)$ $\textstyle =$ $\displaystyle \mathop{\rm var}\nolimits (y)+\mathop{\rm var}\nolimits (-bx)+2\mathop{\rm cov}\nolimits (y,-bx)$  
  $\textstyle =$ $\displaystyle \mathop{\rm var}\nolimits (y)+b^2\mathop{\rm var}\nolimits (x)-2b\mathop{\rm cov}\nolimits (x,y)$  
  $\textstyle =$ $\displaystyle {\sigma_y}^2+{\sigma_x}^2-2b\mathop{\rm cov}\nolimits (x,y).$ (13)

Examine the cases where $\mathop{\rm cor}\nolimits (x,y) = \pm 1$,
\begin{displaymath}
\mathop{\rm cor}\nolimits (x,y) \equiv {\mathop{\rm cov}\nolimits (x,y)\over\sigma_x\sigma_y} = \pm 1
\end{displaymath} (14)


\begin{displaymath}
\mathop{\rm var}\nolimits (y-bx) = b^2{\sigma_x}^2+{\sigma_y}^2\mp 2b\sigma_x\sigma_y = (b\sigma_x\mp\sigma_y)^2.
\end{displaymath} (15)

The Variance will be zero if $b\equiv\pm {\sigma_y/\sigma_x}$, which requires that the argument of the Variance is a constant. Therefore, $y-bx = a$, so $y = a+bx$. If $\mathop{\rm cor}\nolimits (x,y) = \pm 1$, $y$ is either perfectly correlated ($b>0$) or perfectly anticorrelated ($b<0$) with $x$.

See also Covariance, Covariance Matrix, Variance



info prev up next book cdrom email home

© 1996-9 Eric W. Weisstein
1999-05-25