18.7 - Cohen's Kappa Statistic for Measuring Agreement
18.7 - Cohen's Kappa Statistic for Measuring AgreementCohen's kappa statistic, \(\kappa\) , is a measure of agreement between categorical variables X and Y. For example, kappa can be used to compare the ability of different raters to classify subjects into one of several groups. Kappa also can be used to assess the agreement between alternative methods of categorical assessment when new techniques are under study.
Kappa is calculated from the observed and expected frequencies on the diagonal of a square contingency table. Suppose that there are n subjects on whom X and Y are measured, and suppose that there are g distinct categorical outcomes for both X and Y. Let \(f_{ij}\) denote the frequency of the number of subjects with the \(i^{th}\) categorical response for variable X and the \(j^{th}\) categorical response for variable Y.
Then the frequencies can be arranged in the following g × g table:
Y = 1 | Y = 2 | ... | Y = g | |
X = 1 | \(f_{11}\) | \(f_{12}\) | ... | \(f_{1g}\) |
X = 2 | \(f_{21}\) | \(f_{22}\) | ... | \(f_{2g}\) |
| |
| | |
| | |
... ... |
| | |
X = g | \(f_{g1}\) | \(f_{g2}\) | ... | \(f_{gg}\) |
The observed proportional agreement between X and Y is defined as:
\(p_0=\dfrac{1}{n}\sum_{i=1}^{g}f_{ii}\)
and the expected agreement by chance is:
\(p_e=\dfrac{1}{n^2}\sum_{i=1}^{g}f_{i+}f_{+i}\)
where \(f_{i+}\) is the total for the \(i^{th}\) row and \(f_{+i}\) is the total for the \(i^{th}\) column. The kappa statistic is:
\(\hat{\kappa}=\dfrac{p_0-p_e}{1-p_e}\)
Cohen's kappa statistic is an estimate of the population coefficient:
\(\kappa=\dfrac{Pr[X=Y]-Pr[X=Y|X \text{ and }Y \text{ independent}]}{1-Pr[X=Y|X \text{ and }Y \text{ independent}]}\)
Generally, \(0 ≤ \kappa ≤ 1\), although negative values do occur on occasion. Cohen's kappa is ideally suited for nominal (non-ordinal) categories. Weighted kappa can be calculated for tables with ordinal categories.
SAS Example
(19.3_agreement_Cohen.sas) : Two radiologists rated 85 patients with respect to liver lesions. The ratings were designated on an ordinal scale as:
0 ='Normal' 1 ='Benign' 2 ='Suspected' 3 ='Cancer'
SAS PROC FREQ provides an option for constructing Cohen's kappa and weighted kappa statistics.
*******************************************************************************
* This program indicates how to calculate Cohen's kappa statistic for *
* evaluating the level of agreement between two variables. *
*******************************************************************************;
proc format;
value raterfmt 0='Normal' 1='Benign' 2='Suspected' 3='Cancer';
run;
data radiology;
input rater1 rater2 count;
format rater1 rater2 raterfmt.;
cards;
0 0 21
0 1 12
0 2 0
0 3 0
1 0 4
1 1 17
1 2 1
1 3 0
2 0 3
2 1 9
2 2 15
2 3 2
3 0 0
3 1 0
3 2 0
3 3 1
;
run;
proc freq data=radiology;
tables rater1*rater2/agree;
weight count;
test kappa;
exact kappa;
title "Cohen's Kappa Coefficients";
run;
The weighted kappa coefficient is 0.57 and the asymptotic 95% confidence interval is (0.44, 0.70). This indicates that the amount of agreement between the two radiologists is modest (and not as strong as the researchers had hoped it would be).