Lesson 18: Correlation and Agreement

Overview

Many biostatistical analyses are conducted to study the relationship between two continuous or ordinal scale variables within a group of patients.

Purposes of these analyses include:

assessing correlation between the two variables, i.e., identifying whether values of one variable tend to be higher (or possibly lower) for higher values of the other variable;
assessing the amount of agreement between the values of the two variables, i.e., comparing alternative ways of measuring or assessing the same response;
assessing the ability of one variable to predict values of the other variable, i.e., formulating predictive models via regression analyses.

This lesson will focus only on correlation and agreement, (issues numbered 1 and 2 listed above).

Objectives

Upon completion of this lesson, you should be able to:

Recognize appropriate use of Pearson correlation, Spearman correlation, Kendall’s tau-b and Cohen’s Kappa statistics.
Use a SAS program to produce confidence intervals for correlation coefficients and interpret the results.
Adapt a SAS program to produce the correlation coefficients, their confidence intervals and Kendall’s tau-b.
Recognize situations that call for the use of a statistic measuring concordance.
Distinguish between a concordance correlation coefficient and a Kappa statistic based on the type of data used for each.
Interpret a concordance correlation coefficient and a Kappa statistic.

18.1 - Pearson Correlation Coefficient

Correlation is a general method of analysis useful when studying possible association between two continuous or ordinal scale variables. Several measures of correlation exist. The appropriate type for a particular situation depends on the distribution and measurement scale of the data. Three measures of correlation are commonly applied in biostatistics and these will be discussed below.

Suppose that we have two variables of interest, denoted as X and Y, and suppose that we have a bivariate sample of size n:

\(\left(X_{1} , Y_{1} \right), \left(X_{2} , Y_{2} \right), \dots , \left(X_{n} , Y_{n} \right)\)

and we define the following statistics:

\(\bar{X}=\dfrac{1}{n}\sum_{i=1}^{n}X_i , S_{XX}=\dfrac{1}{n-1}\sum_{i=1}^{n}(X_i-\bar{X})^2\)

\(\bar{Y}=\dfrac{1}{n}\sum_{i=1}^{n}Y_i , S_{YY}=\dfrac{1}{n-1}\sum_{i=1}^{n}(Y_i-\bar{Y})^2\)

\(S_{XY}=\dfrac{1}{n-1}\sum_{i=1}^{n}(X_i-\bar{X})(Y_i-\bar{Y})\)

These statistics above represent the sample mean for X, the sample variance for X, the sample mean for Y, the sample variance for Y, and the sample covariance between X and Y, respectively. These should be very familiar to you.

The sample Pearson correlation coefficient (also called the sample product-moment correlation coefficient) for measuring the association between variables X and Y is given by the following formula:

\(r_p=\dfrac{S_{XY}}{\sqrt{S_{XX}S_{YY}}}\)

The sample Pearson correlation coefficient, \(r_{p}\) , is the point estimate of the population Pearson correlation coefficient

\(\rho_p=\dfrac{\sigma_{XY}}{\sqrt{\sigma_{XX}\sigma_{YY}}}\)

The Pearson correlation coefficient measures the degree of linear relationship between X and Y and \(-1 ≤ r_{p} ≤ +1\), so that \(r_{p}\) is a "unitless" quantity, i.e., when you construct the correlation coefficient the units of measurement that are used cancel out. A value of +1 reflects perfect positive correlation and a value of -1 reflects perfect negative correlation.

For the Pearson correlation coefficient, we assume that both X and Y are measured on a continuous scale and that each is approximately normally distributed.

The Pearson correlation coefficient is invariant to location and scale transformations. This means that if every \(X_{i}\) is transformed to

\(X_{i} * = aX_{i} + b\)

and every \(Y_{i}\) is transformed to

\(Y_{i} * = cY_{i} + d\)

where \(a > 0, b, c > 0\), and d are constants, then the correlation between X and Y is the same as the correlation between \(X*\) and \(Y*\).

With SAS, PROC CORR is used to calculate \(r_{p}\). The output from PROC CORR includes summary statistics for both variables and the computed value of \(r_{p}\). The output also contains a p-value corresponding to the test of:

\(H_{0} : \rho_{p} = 0\) versus \(H_{0} : \rho_{p} ≠ 0\)

It should be noted that this statistical test generally is not very useful, and the associated p-value, therefore, should not be emphasized. What is more important is to construct a confidence interval.

The sampling distribution for Pearson's \(r_{p}\) is not normal. In order to attain confidence limits for \(r_{p}\) based on a standard normal distribution, we transform \(r_{p}\) using Fisher's Z transformation to get a quantity, \(z_{p}\), that has an approximate normal distribution. Then we can work with this value. Here is what is involved in the transformation.

Fisher's Z transformation is defined as

\(z_p=\dfrac{1}{2}log_e\left( \dfrac{1+r_p}{1-r_p} \right) \sim N\left( \zeta_p , sd=\dfrac{1}{\sqrt{n-3}} \right)\)

where

\(\zeta_p=\dfrac{1}{2}log_e\left( \dfrac{1+\rho_p}{1-\rho_p} \right)\)

We will use this to get the usual confidence interval, so, an approximate \(100(1 - \alpha)\%\) confidence interval for \(\zeta_{p}\) is given by \([z_{p, \frac{\alpha}{2}} , z_{p, 1-\frac{\alpha}{2}}]\), where

\(z_{p , \alpha/2}=z_p-\left( t_{n-3 , 1-\alpha/2}/\sqrt{n-3} \right) , z_{p , 1-\alpha/2}=z_p+\left( t_{n-3 , 1-\alpha/2}/\sqrt{n-3} \right)\)

But really what we want is an approximate \(100(1 - \alpha)\%\) confidence interval for \(\rho_{p}\) is given by \([r_{p, \frac{\alpha}{2}} , r_{p, 1- \frac{\alpha}{2}}]\), where

\(r_{p , \frac{\alpha}{2}}=\dfrac{exp(2z_{p , \alpha/2})-1}{exp(2z_{p , \alpha/2})+1},r_{p , 1-\alpha/2}=\dfrac{exp(2z_{p , 1-\alpha/2})-1}{exp(2z_{p , 1-\alpha/2})+1}\)

Again, you do not have to do this by hand. PROC CORR in SAS will do this for you but it is important to have an idea of what is going on.

18.2 - Spearman Correlation Coefficient

The Spearman rank correlation coefficient, \(r_s\), is a nonparametric measure of correlation based on data ranks. It is obtained by ranking the values of the two variables (X and Y) and calculating the Pearson \(r_p\) on the resulting ranks, not the data itself. Again, PROC CORR will do all of these actual calculations for you.

The Spearman rank correlation coefficient has properties similar to those of the Pearson correlation coefficient, although the Spearman rank correlation coefficient quantifies the degree of linear association between the ranks of X and the ranks of Y. Also, \(r_s\) does not estimate a natural population parameter (unlike Pearson's \(r_p\) which estimates \(\rho_p\) ).

An advantage of the Spearman rank correlation coefficient is that the X and Y values can be continuous or ordinal, and approximate normal distributions for X and Y are not required. Similar to the Pearson \(r_p\), Fisher's Z transformation can be applied to the Spearman \(r_s\) to get a statistic, \(z_s\), that has an asymptotic normal distribution for calculating an asymptotic confidence interval. Again, PROC CORR will do this as well.

18.3 - Kendall Tau-b Correlation Coefficient

The Kendall tau-b correlation coefficient, \(\tau_b\), is a nonparametric measure of association based on the number of concordances and discordances in paired observations.

Suppose two observations \(\left(X_i , Y_i \right)\) and \(\left(X_j , Y_j \right)\) are concordant if they are in the same order with respect to each variable. That is, if

\(X_i < X_j\) and \(Y_i < Y_j\) , or if
\(X_i > X_j\) and \(Y_i > Y_j\)

They are discordant if they are in the reverse ordering for X and Y, or the values are arranged in opposite directions. That is, if

\(X_i < X_j\) and \(Y_i > Y_j\) , or if
\(X_i > X_j\) and \(Y_i < Y_j\)

The two observations are tied if \(X_i = X_j\) and/or \(Y_i = Y_j\) .

The total number of pairs that can be constructed for a sample size of n is

\(N=\binom{n}{2}=\dfrac{1}{2}n(n-1)\)

N can be decomposed into these five quantities:

\(N = P + Q + X_0 + Y_0 + (XY)_0\)

where P is the number of concordant pairs, Q is the number of discordant pairs, \(X_0\) is the number of pairs tied only on the X variable, \(Y_0\) is the number of pairs tied only on the Y variable, and \(\left(XY\right)_0\) is the number of pairs tied on both X and Y.

The Kendall tau-b for measuring order association between variables X and Y is given by the following formula:

\(t_b=\dfrac{P-Q}{\sqrt{(P+Q+X_0)(P+Q+Y_0)}}\)

This value becomes scaled and ranges between -1 and +1. Unlike Spearman it does estimate a population variance as:

\(t_b \text{ is the sample estimate of } t_b = Pr[\text{concordance}] - Pr[\text{discordance}]\)

The Kendall tau-b has properties similar to the properties of the Spearman \(r_s\). Because the sample estimate, \(t_b\) , does estimate a population parameter, \(t_b\) , many statisticians prefer the Kendall tau-b to the Spearman rank correlation coefficient.

18.4 - Example - Correlation Coefficients

SAS® Example

Provides an IML module for calculating point and interval estimates of the Pearson correlation coefficient and the concordance correlation coefficient

(19.1_correlation.sas): Age and percentage body fat were measured in 18 adults. SAS PROC CORR provides estimates of the Pearson, Spearman, and Kendall correlation coefficients. It also calculates Fisher's Z transformation for the Pearson and Spearman correlation coefficients in order to get 95% confidence intervals.

*******************************************************************************
*  This program indicates how to construct a bivariate scatterplot with an    *
*  overlay of the least squares regression line.                              *
*                                                                             *
*  This program also provides an example for calculating point and            *
*  interval estimates of the Pearson, Spearman, and Kendall correlation       *
*  coefficients.                                                              *
*******************************************************************************;

data bodyfat;
input subject age bodyfat_perc;
cards;
01 23  9.5
02 23 27.9
03 27  7.8
04 27 17.8
05 39 31.4
06 41 25.9
07 45 27.4
08 49 25.2
09 50 31.1
10 53 34.7
11 53 42.0
12 54 29.1
13 56 32.5
14 57 30.3
15 58 33.0
16 58 33.8
17 60 41.1
18 61 34.5
;
run;

proc gplot data=bodyfat;
plot bodyfat_perc*age/vaxis=axis1 haxis=axis2 nolegend frame;
axis1 label=(a=90 '% Body Fat') minor=none;
axis2 label=('Age') minor=none;
symbol1 value=star color=black interpol=r;
title "Scatterplot";
run;

proc corr data=bodyfat Pearson Spearman Kendall Fisher(biasadj=no);
var age;
with bodyfat_perc;
title "Correlation Coefficients";
run;

The resulting estimates for this example are 0.7921, 0.7539, and 0.5762, respectively for the Pearson, Spearman, and Kendall correlation coefficients. The Kendall tau-b correlation typically is smaller in magnitude than the Pearson and Spearman correlation coefficients.

The 95% confidence intervals are (0.5161, 0.9191) and (0.4429, 0.9029), respectively for the Pearson and Spearman correlation coefficients. Because the Kendall correlation typically is applied to binary or ordinal data, its 95% confidence interval can be calculated via SAS PROC FREQ (this is not shown in the SAS program above).

18.5 - Use and Misuse of Correlation Coefficients

Correlation is a widely-used analysis tool that sometimes is applied inappropriately. Some caveats regarding the use of correlation methods follow.

The correlation methods discussed in this chapter should be used only with independent data; they should not be applied to repeated measures data where the data are not independent. For example, it would not be appropriate to use these measures of correlation to describe the relationship between Week 4 and Week 8 blood pressures in the same patients.
Caution should be used in interpreting results of correlation analysis when large numbers of variables have been examined, resulting in a large number of correlation coefficients.
The correlation of two variables that both have been recorded repeatedly over time can be misleading and spurious. Time trends should be removed from such data before attempting to measure correlation.
To extend correlation results to a given population, the subjects under study must form a representative (i.e., random) sample from that population. The Pearson correlation coefficient can be very sensitive to outlying observations and all correlation coefficients are susceptible to sample selection biases.
Care should be taken when attempting to correlate two variables where one is a part and one represents the total. For example, we would expect to find a positive correlation between height at age ten and adult height because the second quantity "contains" the first quantity.
Correlation should not be used to study the relation between an initial measurement, X, and the change in that measurement over time, Y - X. X will be correlated with Y - X due to the regression to the mean phenomenon.
Small correlation values do not necessarily indicate that two variables are unassociated. For example, Pearson's \(r_p\) will underestimate the association between two variables that show a quadratic relationship. Scatterplots should always be examined.
Correlation does not imply causation. If a strong correlation is observed between two variables A and B, there are several possible explanations:
1. A influences B
2. B influences A
3. A and B are influenced by one or more additional variables
4. the relationship observed between A and B was a chance error.
"Regular" correlation coefficients are often published when the researcher really intends to compare two methods of measuring the same quantity with respect to their agreement. This is a misguided analysis because correlation measures only the degree of association; it does not measure agreement. The next section of this lesson will present a measure of agreement.

18.6 - Concordance Correlation Coefficient for Measuring Agreement

How well do two diagnostic measurements agree? Many times continuous units of measurement are used in the diagnostic test. We may not be interested in correlation or linear relationship between the two measures, but in a measure of agreement.

The concordance correlation coefficient, \(r_c\) , for measuring agreement between continuous variables X and Y (both approximately normally distributed), is calculated as follows:

\(r_c=\dfrac{2S_{XY}}{S_{XX}+S_{YY}+(\bar{X}-\bar{Y})^2}\)

Similar to the other correlation coefficient, the concordance correlation satisfies \(-1 ≤ r_c ≤ +1\). A value of \(r_c = +1\) corresponds to perfect agreement. A value of \(r_c = - 1\) corresponds to perfect negative agreement, and a value of \(r_c = 0\) corresponds to no agreement. The sample estimate, \(r_c\) , is an estimate of the population concordance correlation coefficient:

\(\rho_c=\dfrac{2\sigma_{XY}}{\sigma_{XX}+\sigma_{YY}+(\mu_{X}-\mu_{Y})^2}\)

Let's look at an example that will help to make this concept clearer.

SAS® Example

SAS PROC FREQ option for constructing Cohen's kappa and weighted kappa statistics

(19.2_agreement_concordanc.sas) :

*******************************************************************************
*  This program indicates how to construct a bivariate scatterplot with an    *
*  overlay of the line of identity.                                           *
*                                                                             *
*  This program also provides an IML module for calculating point and         *
*  interval estimates of the Pearson correlation coefficient and the          *
*  concordance correlation coefficient.                                       *
*******************************************************************************;

data dice_baseline;
input subject cort_auc1 cort_auc2;
cards;
61001    5.28170    5.37851
61002    5.58796    5.33628
61003    6.47607    6.59770
61005    6.36019    6.39746
61007    5.81121    5.82528
61008    6.03036    6.21147
61009    5.84549    6.10434
61014    6.80349    6.90689
61015    6.28977    6.27369
61023    5.88446    5.94352
61024    5.79701    6.04876
61026    5.01302    4.72154
61027    6.48824    6.37891
61028    5.30862    5.53405
61033    5.70905    5.77803
61034    5.98545    5.77613
61035    6.19924    6.20880
61036    6.00639    5.98313
61037    6.27793    6.44342
61038    6.57390    6.62936
61039    5.69639    5.43509
61042    5.96588    6.01282
61044    6.04803    6.11529
61046    6.39423    6.03876
62002    7.44584    7.58421
62003    5.90813    5.87230
62004    6.05483    5.94695
62005    5.65735    5.64983
62006    6.44815    6.44280
62007    6.28611    6.45374
62009    6.40863    6.14994
62012    5.62564    5.58142
62013    6.68375    6.58815
62014    5.76951    5.84802
62015    5.94383    5.95489
62016    5.66024    5.73711
62017    4.77492    4.57465
62018    5.60468    5.43495
62019    6.82819    6.86652
62020    5.18986    5.05725
62021    6.48810    6.59655
62022    6.08867    5.76965
62023    5.91400    5.89672
62024    5.58217    5.57651
62026    6.32857    6.48921
62027    7.67703    7.76541
62028    5.92411    5.66689
62029    6.15313    6.16558
62030    5.20392    5.42481
62032    6.43962    6.46171
62033    6.20661    6.28542
63001    6.04767    5.82528
63002    6.46923    6.51728
63003    5.68370    5.79701
63004    5.11719    5.47363
63005    6.10993    6.13541
63006    4.91744    5.06968
63008    5.35972    5.56605
63010    6.73016    6.76285
63011    5.93700    5.94092
63012    6.07716    5.92548
63014    6.58185    6.52781
63015    5.84317    5.85030
63016    5.98144    6.22389
63017    5.77452    5.81662
63018    5.46142    5.84898
63021    5.44920    5.43688
63022    5.96519    6.01302
63026    6.29258    6.42339
63027    6.86608    6.92806
63028    5.47875    5.72634
63030    6.16190    6.16608
63032    5.66707    5.97114
63033    5.80634    5.63640
63034    6.37256    6.24416
63035    5.65755    5.98070
64001    6.07596    6.06010
64002    6.57898    6.54552
64003    6.60733    6.89724
64004    5.69611    5.82963
64005    6.60331    6.62972
64007    5.89963    5.83195
64008    6.19731    6.07044
64009    5.88875    6.03427
64010    5.64912    5.46135
64011    6.99962    7.10324
64012    6.61282    6.68520
64015    6.60477    6.76468
64016    5.35161    5.55307
64017    6.63249    6.77868
64020    5.37717    5.19760
64023    5.58781    5.87044
64025    5.74499    5.77090
64027    5.92655    6.09265
64028    5.01060    5.17112
64030    7.12400    7.17368
64031    5.79909    5.37603
64032    5.75609    5.85174
64033    6.79275    6.78255
64035    6.02198    6.03915
64036    5.43960    5.82229
64037    5.20163    5.12928
65001    6.18838    6.66593
65002    6.13860    6.26224
65003    6.98807    6.97658
65005    5.54628    5.50547
65006    4.47249    4.59256
65007    5.04034    5.07775
65008    5.42025    5.46227
65009    5.26772    5.41463
65011    6.43019    6.38438
65013    6.56323    6.46607
65014    5.06134    4.70619
65016    6.32666    6.31564
65017    5.90235    6.05890
65018    6.05800    5.99467
65020    5.96388    6.01204
65021    5.57324    5.61324
65022    6.21017    6.15262
65025    6.01934    6.00318
65026    5.52082    5.54575
65027    5.89237    5.67469
65028    6.24592    6.37106
65031    6.31524    6.44334
65032    6.19602    6.29576
65033    6.07305    6.07966
65037    5.60960    5.38492
65038    5.39806    5.18466
65040    5.95464    6.20802
66003    5.96880    5.86128
66004    5.89707    5.69116
66005    6.27067    6.35294
66006    5.39744    5.23236
66010    6.64408    6.64990
66011    6.29430    6.32724
66012    5.22507    5.28754
66013    5.15840    5.05580
66014    5.85385    5.54974
66018    6.49611    5.87795
66019    5.43285    5.50871
66021    6.32416    6.31612
66023    5.77253    5.74469
66024    5.89920    5.95774
;
run;

proc means data=dice_baseline noprint;
var cort_auc1 cort_auc2;
output out=dice_baselinemin min=var1 var2;
run;

proc means data=dice_baseline noprint;
var cort_auc1 cort_auc2;
output out=dice_baselinemax max=var1 var2;
run;

data dice_baselineall;
set dice_baselinemin dice_baselinemax;
drop _type_ _freq_;
if _N_=1 then var1=floor(min(var1,var2));
if _N_=1 then var2=floor(min(var1,var2));
if _N_=2 then var1=ceil(max(var1,var2));
if _N_=2 then var2=ceil(max(var1,var2));
run;

data dice_baselineall;
set dice_baseline dice_baselineall;
run;

proc gplot uniform data=dice_baselineall;
plot cort_auc1*cort_auc2 var1*var2/overlay vaxis=axis1 haxis=axis2 nolegend frame;
axis1 label=(a=90 'Cortisol Every Hour') minor=none;
axis2 label=('Cortisol Every Two Hours') minor=none;
symbol1 value=star color=black interpol=none;
symbol2 value=none color=black interpol=join;
title "Concordance Correlation Coefficient";
run;

proc iml;
*******************************************************************************
*  Enter the appropriate SAS data set name in the use statement and enter the *
*  appropriate variable names in the read statements.                         *
*******************************************************************************;
use dice_baseline;
read all var {cort_auc1} into var1;
read all var {cort_auc2} into var2;
*******************************************************************************
*  The IML module, labeled concorr, starts next.                              *
*******************************************************************************;
start concorr;
nonmiss=loc(var1#var2^=.);
var1=var1[nonmiss];
var2=var2[nonmiss];
free nonmiss;
n=nrow(var1);
mu1=sum(var1)/n;
mu1=round(mu1,0.0001);
mu2=sum(var2)/n;
mu2=round(mu2,0.0001);
sigma11=ssq(var1-mu1)/(n-1);
sigma11=round(sigma11,0.0001);
sigma22=ssq(var2-mu2)/(n-1);
sigma22=round(sigma22,0.0001);
sigma12=sum((var1-mu1)#(var2-mu2))/(n-1);
sigma12=round(sigma12,0.0001);
lshift=(mu1-mu2)/((sigma11#sigma22)##0.25);
rho=sigma12/sqrt(sigma11#sigma22);
rho=round(rho,0.0001);
z=log((1+rho)/(1-rho))/2;
se_z=1/sqrt(n-3);
t=tinv(0.975,n-3);
z_low=z-(se_z#t);
z_upp=z+(se_z#t);
rho_low=(exp(2#z_low)-1)/(exp(2#z_low)+1);
rho_low=round(rho_low,0.0001);
rho_upp=(exp(2#z_upp)-1)/(exp(2#z_upp)+1);
rho_upp=round(rho_upp,0.0001);
crho=(2#sigma12)/((sigma11+sigma22)+((mu1-mu2)##2));
crho=round(crho,0.0001);
z=log((1+crho)/(1-crho))/2;
if sigma12^=0 then do;
   t1=((1-(rho##2))#(crho##2))/((1-(crho##2))#(rho##2));
   t2=(2#(crho##3)#(1-crho)#(lshift##2))/(rho#((1-(crho##2))##2));
   t3=((crho##4)#(lshift##4))/(2#(rho##2)#((1-(crho##2))##2));
   se_z=sqrt((t1+t2-t3)/(n-2));
end;
else se_z=sqrt(2#sigma11#sigma22)/((sigma11+sigma22+((mu1-mu2)##2))#(n-2));
t=tinv(0.975,n-2);
z_low=z-(se_z#t);
z_upp=z+(se_z#t);
crho_low=(exp(2#z_low)-1)/(exp(2#z_low)+1);
crho_low=round(crho_low,0.0001);
crho_upp=(exp(2#z_upp)-1)/(exp(2#z_upp)+1);
crho_upp=round(crho_upp,0.0001);
Results=n//mu1//mu2//sigma11//sigma22//sigma12//rho_low//rho//rho_upp//
        crho_low//crho//crho_upp;
r_name={'SampleSize' 'Mean_1' 'Mean_2' 'Variance_1' 'Variance_2' 'Covariance' 'Corr LowerCL'
       'Corr' 'Corr UpperCL' 'ConcCorr LowerCL' 'ConcCorr' 'ConcCorr UpperCL'};
print 'The Estimated Correlation and Concordance Correlation (and 95% Confidence Limits)';
print Results [rowname=r_name];
finish concorr;
*******************************************************************************
*  The IML module, labeled concorr, is finished.                              *
*******************************************************************************;
run concorr;

The ACRN DICE trial was discussed earlier in this course. In that trial, participants underwent hourly blood draws between 08:00 PM and 08:00 AM once a week in order to determine the cortisol area-under-the-curve (AUC). The participants hated this! They complained about the sleep disruption every hour when the nurses came by to draw blood, so the ACRN wanted to determine for future studies if the cortisol AUC calculated on measurements every two hours was in good agreement with the cortisol AUC calculated on hourly measurements. The baseline data were used to investigate how well these two measurements agreed. If there is good agreement, the protocol could be changed to take blood every two hours.

Note for this SAS program - Run the program to view the output. This is higher level SAS than you are expected to program yourself in this course, but some of you may find the programming of interest.

The SAS program yielded \(r_c = 0.95\) and a 95% confidence interval = (0.93, 0.96). The ACRN judged this to be excellent agreement, so it will use two-hourly measurements in future studies.

What about binary or ordinal data? Cohen's Kappa Statistic will handle this...

18.7 - Cohen's Kappa Statistic for Measuring Agreement

Cohen's kappa statistic, \(\kappa\) , is a measure of agreement between categorical variables X and Y. For example, kappa can be used to compare the ability of different raters to classify subjects into one of several groups. Kappa also can be used to assess the agreement between alternative methods of categorical assessment when new techniques are under study.

Kappa is calculated from the observed and expected frequencies on the diagonal of a square contingency table. Suppose that there are n subjects on whom X and Y are measured, and suppose that there are g distinct categorical outcomes for both X and Y. Let \(f_{ij}\) denote the frequency of the number of subjects with the \(i^{th}\) categorical response for variable X and the \(j^{th}\) categorical response for variable Y.

Then the frequencies can be arranged in the following g × g table:

	Y = 1	Y = 2	...	Y = g
X = 1	\(f_{11}\)	\(f_{12}\)	...	\(f_{1g}\)
X = 2	\(f_{21}\)	\(f_{22}\)	...	\(f_{2g}\)
\| \|	\| \|	\| \|	... ...	\| \|
X = g	\(f_{g1}\)	\(f_{g2}\)	...	\(f_{gg}\)

The observed proportional agreement between X and Y is defined as:

\(p_0=\dfrac{1}{n}\sum_{i=1}^{g}f_{ii}\)

and the expected agreement by chance is:

\(p_e=\dfrac{1}{n^2}\sum_{i=1}^{g}f_{i+}f_{+i}\)

where \(f_{i+}\) is the total for the \(i^{th}\) row and \(f_{+i}\) is the total for the \(i^{th}\) column. The kappa statistic is:

\(\hat{\kappa}=\dfrac{p_0-p_e}{1-p_e}\)

Cohen's kappa statistic is an estimate of the population coefficient:

\(\kappa=\dfrac{Pr[X=Y]-Pr[X=Y|X \text{ and }Y \text{ independent}]}{1-Pr[X=Y|X \text{ and }Y \text{ independent}]}\)

Generally, \(0 ≤ \kappa ≤ 1\), although negative values do occur on occasion. Cohen's kappa is ideally suited for nominal (non-ordinal) categories. Weighted kappa can be calculated for tables with ordinal categories.

SAS Example

(19.3_agreement_Cohen.sas) : Two radiologists rated 85 patients with respect to liver lesions. The ratings were designated on an ordinal scale as:

0 ='Normal' 1 ='Benign' 2 ='Suspected' 3 ='Cancer'

SAS PROC FREQ provides an option for constructing Cohen's kappa and weighted kappa statistics.

*******************************************************************************
*  This program indicates how to calculate Cohen's kappa statistic for        *
*  evaluating the level of agreement between two variables.                   *
*******************************************************************************;

proc format;
value raterfmt 0='Normal' 1='Benign' 2='Suspected' 3='Cancer';
run;

data radiology;
input rater1 rater2 count;
format rater1 rater2 raterfmt.;
cards;
0 0 21
0 1 12
0 2  0
0 3  0
1 0  4
1 1 17
1 2  1
1 3  0
2 0  3
2 1  9
2 2 15
2 3  2
3 0  0
3 1  0
3 2  0
3 3  1
;
run;

proc freq data=radiology;
tables rater1*rater2/agree;
weight count;
test kappa;
exact kappa;
title "Cohen's Kappa Coefficients";
run;

The weighted kappa coefficient is 0.57 and the asymptotic 95% confidence interval is (0.44, 0.70). This indicates that the amount of agreement between the two radiologists is modest (and not as strong as the researchers had hoped it would be).

Note! Updated programs for examples 19.2 and 19.3 are in the folder for this lesson. Take a look.

18.8 - Summary

In this lesson, among other things, we learned how to:

recognize appropriate use of Pearson correlation, Spearman correlation, Kendall’s tau-b and Cohen’s Kappa statistics.
use a SAS program to produce confidence intervals for correlation coefficients and interpret the results.
adapt a SAS program to produce the correlation coefficients, their confidence intervals and Kendall’s tau-b.
recognize situations that call for the use of a statistic measuring concordance.
distinguish between a concordance correlation coefficient and a Kappa statistic based on the type of data used for each.
interpret a concordance correlation coefficient and a Kappa statistic.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility