Let's look at some special cases. For instance, how does X vary with itself? What is
Cov(X, X) = EXX - EX EX
Cov(X, X) = EX2 - (EX)2 = Var(X)
This is the same as the variance of X.
Here is an example of one use of covariance. Let's say you wanted to find the variance of X + Y. Remember, you can't just write that this is just Var(X + Y). If you were to do an E(X + Y) you could because of the linearity of expectations, but you can not do this with variances. You can, however, do this adding an adjustment including how these two behave jointly.
Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y)
The variance of each one individually and the adjustment for how they behave together. So, if you needed to find the Var(X + Y) you could do it this way.
Consequence #1
Let's think about a consequence of this. What is the Var(X - Y) ?
Var(X - Y) = Var(X) + Var(Y) - 2Cov(X, Y)
We could also write this as:
Var(X + (-Y)) = Var(X) + Var(-1 × Y) - 2Cov(X, Y)
We could simplify this then using the 1 as a constant.
Var(X + (-Y)) = Var(X) + (-1)2Var(Y) - 2Cov(X, Y)
Var(X + (-Y)) = Var(X) + Var(Y) - 2Cov(X, Y)It turns out that Var(X) + Var(Y) is the same as Cov(X, -Y), therefore
Cov(X, -Y) = -Cov(X, Y)
Why is this so. Let's use the formula and see where this goes.
Cov(X, -Y) = E{X (-Y)} - EXE(-Y)
We pull the -1 out
Cov(X, -Y) = -1(EXY - EXEY)
Which is the same as
Cov(X, -Y) = -Cov(X, Y)
The expectations should be pretty easy. The moment you start getting the expectations, things start getting easier to work with.
Consequence #2
If the X, Y are independent then the covariance of X and Y is 0.
Cov(X, Y) = 0
Since the Cov(X, Y) = 0, then
Var(X + Y) = Var(X) + Var(Y)
The 2Cov(X, Y) is gone because of the independence.
Much of the information that we talk about in this class is useful if you are involved in actuarial work, or do more advanced work in statistical probability theory. On the other hand, covariances are more directly applicable to real life. Covariances are directly related to correlations and we read about correlations in the news all of the time.
Covariance and Correlation
First, let's begin by defining correlation.
Definition: The correlation of X and Y, is denoted by the Greek letter ρ (rho) as the covariance of X and Y divided by the square root of the variances as follows :
Another way of stating this is
correlation ρ = covariance / product of the standard deviations
and notation that we sometimes use for this is: .
Some Facts:
- ρ lives within the interval [-1, 1]
- It doesn't matter what kinds of values you have for X and Y. They could be huge values and the correlation will always be between -1 and 1.
- ρ is unitless
- This means that the correlation is not affected by the units being measured, i.e., you can multiply x by any constant and ρ stays the same.
Proof of Fact #1
We will start off using a trick by writing something that we know is non-negative.
We can write out all of the calculations that are called for and get
What do we know about the expectation of the first term? This is the variance of Y, and we include the other terms after this
The expectation above is the covariance of X and Y so we can write this instead as ρσxσy and the σx2 cancel out to get
We can cancel out terms and gather up others and this leaves us with
So, what do we know about σy2? It is ≥ 0. If this is the case then (1 - ρ2) had better be ≥ 0 as well because their product is ≥ 0.
What does this imply? ρ2 would have to be ≤ 1.
Hence, ρ ∈ [-1, 1]. ρ, the correlation, will always be between -1 and 1
The General Notions Correlation
The general notion of correlation to understand that ρ is a measure of the linear relationship between X and Y. The word linear is very important. Why? This is because of the way the word correlation is being used. In English, when we say that two things are not correlated, what do you think that means? They are not related right? However, in the very precise usage of this term two things are not correlated this means that they are not linearly related. They could be related in all sorts of other funky ways, and very strongly related but just not linearly related.
You might hear people say, "These two things are not correlated. Hence, we think that one doesn't affect the other at all." There are many examples of these in the news all the time. This is a problem of mixing up correlations and relations.
Now you should know something that is very important which is that ρ is a measure of the linear relationship and NOT the measure of the relationship.
To reiterate what we discussed earlier: X, Y being independent implies that ρ = 0. If they are independent of each other then there has to be no linear relationship at all.
But this does not work the other way around. If ρ = 0 (and hence, the Cov(X, Y) = 0) does not imply that X, Y are independent.
There are lots of very smart people that make this very basic mistake, and this is to confuse these two terms.
Let's take a look at this in a couple of different contexts
a) The extreme version where ρ = 1. This means that X and Y are perfectly, positively, linearly related. In graphic terms, if you tell me what X is you can tell exactly what Y would be. As X gets larger, Y also increases.
b) The oppposite extreme is where ρ = -1 is the case where X and Y are perfectly, negatively and linearly related. Graphically examples may looks like:
c) ρ = 0 is the case where X and Y have perfectly no linear relationship
[Reading - in the text pages 210 and 211 (especially if you are taking STAT 415) information about least squares which is directly related to regression.]
d) ρ >0 this is the type of case where X and Y are positively and linearly related.
You notice that it is not perfect. There is some randomness here that does not allow me to tell you exactly what the relationship is. This is not a deterministic relationship as the earlier examples were.
e) ρ < is the type of case where X and Y are negatively and linearly related.