7.4 - Receiver Operating Characteristic Curve (ROC)

A Receiver Operating Characteristic Curve (ROC) is a standard technique for summarizing classifier performance over a range of trade-offs between true positive (TP) and false positive (FP) error rates (Sweets, 1988). ROC curve is a plot of sensitivity (the ability of the model to predict an event correctly) versus 1-specificity for the possible cut-off classification probability values \(\pi_0\).

For logistic regression we can create a \(2\times 2\) classification table of predicted values from your model for the response if \(\hat{y}=0\) or 1 versus the true value of \(y = 0\) or 1. The prediction if \(\hat{y}=1\) depends on some cut-off probability, \(\pi_0\). For example, \(\hat{y}=1\) if \(\hat{\pi}_i>\pi_0\) and \(\hat{y}=0\) if \(\hat{\pi}_i \leq \pi_0\). The most common value for \(\pi_0 = 0.5\). Then \(sensitivity=P(\hat{y}=1|y=1)\) and \(specificity=P(\hat{y}=0|y=0)\).

The ROC curve is more informative than the classification table since it summarizes the predictive power for all possible \(\pi_0\).

The position of the ROC on the graph reflects the accuracy of the diagnostic test. It covers all possible thresholds (cut-off points). The ROC of random guessing lies on the diagonal line. The ROC of a perfect diagnostic technique is a point at the upper left corner of the graph, where the TP proportion is 1.0 and the FP proportion is 0.

The Area Under the Curve (AUC), also referred to as index of accuracy (A), or concordance index, \(c\), in SAS, and it is an accepted traditional performance metric for a ROC curve. The higher the area under the curve the better prediction power the model has. \(c = 0.8 \) can be interpreted to mean that a randomly selected individual from the positive group has a test value larger than that for a randomly chosen individual from the negative group 80 percent of the time.

The following is taken from the SAS program assay.sas.

options nocenter nodate nonumber linesize=72;
data assay; 
input logconc y n; 
cards; 
2.68  10  31 
2.76  17  30 
2.82  12  31 
2.90   7  27 
3.02  23  26 
3.04  22  30 
3.13  29  31 
3.20  29  30 
3.21  23  30 
; 
run; 
                                                                                 
proc logistic data=assay; 
  model y/n= logconc / scale=pearson outroc=roc1; 
  output out=out1 xbeta=xb reschi=reschi; 
  run; 
                                                                                 
axis1 label=('Linear predictor');
axis2 label=('Pearson Residual');
proc gplot data=out1; 
  title 'Residual plot'; 
  plot reschi * xb / haxis=axis1 vaxis=axis2; 
run; 
symbol1 i=join v=none c=blue;
proc gplot data=roc1;
  title 'ROC plot';
  plot  _sensit_*_1mspec_=1 / vaxis=0 to 1 by .1 cframe=ligr ;
run;

Here is the resulting ROC graph.

ROC Curve for Model 0.00 0.25 0.50 0.75 1.00 Sensitivity 0.00 0.25 0.50 0.75 1.00 1 - Specificity ROC Curve for Model Area Under the Curve = 0.7462

Area under the curve is \(c = 0.746\) indicates good predictive power of the model.

 

Association of Predicted Probabilities and Observed Responses

Percent Concordant

70.6

Somers' D

0.492

Percent Discordant

21.4

Gamma

0.535

Percent Tied

8.0

Tau-a

0.226

Pairs

16168

c

0.746

Option ctable prints the classification tables for various cut-off points. Each row of this output is a classification table for the specified Prob Level, \(\pi_0\).

 

Classification Table

Prob
Level

Correct

Incorrect

Percentages

Event

Non-
Event

Event

Non-
Event

Correct

Sensi-
tivity

Speci-
ficity

Pos
Pred

Neg
Pred

0.280

172

0

94

0

64.7

100.0

0.0

64.7

.

0.300

162

21

73

10

68.8

94.2

22.3

68.9

67.7

0.320

162

21

73

10

68.8

94.2

22.3

68.9

67.7

0.340

162

21

73

10

68.8

94.2

22.3

68.9

67.7

0.360

162

21

73

10

68.8

94.2

22.3

68.9

67.7

0.380

162

21

73

10

68.8

94.2

22.3

68.9

67.7

0.400

145

34

60

27

67.3

84.3

36.2

70.7

55.7

0.420

145

34

60

27

67.3

84.3

36.2

70.7

55.7

0.440

145

34

60

27

67.3

84.3

36.2

70.7

55.7

0.460

145

34

60

27

67.3

84.3

36.2

70.7

55.7

0.480

133

53

41

39

69.9

77.3

56.4

76.4

57.6

0.500

133

53

41

39

69.9

77.3

56.4

76.4

57.6

0.520

133

53

41

39

69.9

77.3

56.4

76.4

57.6

0.540

133

53

41

39

69.9

77.3

56.4

76.4

57.6

0.560

133

53

41

39

69.9

77.3

56.4

76.4

57.6

0.580

133

53

41

39

69.9

77.3

56.4

76.4

57.6

0.600

126

73

21

46

74.8

73.3

77.7

85.7

61.3

0.620

126

73

21

46

74.8

73.3

77.7

85.7

61.3

0.640

126

73

21

46

74.8

73.3

77.7

85.7

61.3

0.660

126

73

21

46

74.8

73.3

77.7

85.7

61.3

0.680

126

73

21

46

74.8

73.3

77.7

85.7

61.3

0.700

126

73

21

46

74.8

73.3

77.7

85.7

61.3

0.720

126

73

21

46

74.8

73.3

77.7

85.7

61.3

0.740

103

76

18

69

67.3

59.9

80.9

85.1

52.4

0.760

81

84

10

91

62.0

47.1

89.4

89.0

48.0

0.780

81

84

10

91

62.0

47.1

89.4

89.0

48.0

0.800

81

84

10

91

62.0

47.1

89.4

89.0

48.0

0.820

81

84

10

91

62.0

47.1

89.4

89.0

48.0

0.840

52

84

10

120

51.1

30.2

89.4

83.9

41.2

0.860

52

86

8

120

51.9

30.2

91.5

86.7

41.7

0.880

52

86

8

120

51.9

30.2

91.5

86.7

41.7

0.900

0

94

0

172

35.3

0.0

100.0

.

35.3

Here is part of the R program assay.R that plots the ROC curve.


                                  #### ROC curve
                                  #### sensitivity vs 1-specificity
                                  lp = result$linear.predictors
                                  p = exp(lp)/(1+exp(lp))
                                  cbind(yes,no,p)
                                  p0 = 0
                                  sens = 1
                                  spec = 0
                                  total = 100
                                  for (i in (1:total)/total)
                                  { 
                                  yy = sum(r*(p>=i))
                                  yn = sum(r*(p=i))
                                  nn = sum(n*(p<i))
                                

Here is the ROC graph from R output: