10.7 - Example: Swiss Bank Notes

Example 10-6: Swiss Bank notes Section

Recall that we have two populations of notes, genuine and counterfeit, and that six measurements were taken on each note:

  • Length
  • Right-Hand Width
  • Left-Hand Width
  • Top Margin
  • Bottom Margin
  • Diagonal

Priors

In this case it would not be reasonable to consider equal priors for the two types of banknotes. Equal priors would assume that half the banknotes in circulation are counterfeit and half are genuine. This is a very high counterfeit rate and if it was that bad the Swiss government would probably be bankrupt!  We need to consider unequal priors in which the vast majority of banknotes are thought to be genuine. For this example let us assume that no more than 1% of bank notes in circulation are counterfeit and 99% of the notes are genuine. The prior probabilities can then be expressed as:

\(\hat{p}_1 = 0.99\) and \(\hat{p}_2 = 0.01\)

The first step in the analysis is going to carry out Bartlett's test to check for homogeneity of the variance-covariance matrices.

Download the text file with the data here: swiss1.txt

Using SAS

To do this we will use the SAS program shown below:

Download the SAS program here: swiss9.sas

  View the video explanation of the SAS code.
 

SAS Notes

By default, SAS will make this decision for you. Let's look at the proc descrim procedure in the SAS Program that we just used.

By including pool=test, SAS will decide what kind of discriminant analysis to carry out based on the results of this test.

If the test fails to reject, then SAS will automatically do a linear discriminant analysis. If the test rejects, then SAS will do a quadratic discriminant analysis.

There are two other options here. If we put pool=yes then SAS will conduct a linear discriminant analysis whether it is warranted or not. It will pool the variance-covariance matrices and do a linear discriminant analysis without reporting Bartlett's test.

If pool=no then SAS will not pool the variance-covariance matrices and perform the quadratic discriminant analysis.

SAS does not actually print out the quadratic discriminant function, but it will use quadratic discriminant analysis to classify sample units into populations.

Using Minitab

View the video below to see how discriminant analysis is performed using the Minitab statistical software application.

Bartlett's Test finds a significant difference between the variance-covariance matrices of the genuine and counterfeit bank notes \(\left( \mathrm { L } ^ { \prime } = 121.90 ; \mathrm { d.f. } = 21 ; \mathrm { p } < 0.0001 \right)\).  The variance-covariance matrix for the genuine notes is not equal to the variance-covariance matrix for the counterfeit notes.  Because we reject the null hypothesis of equal variance-covariance matrices, this suggests that a linear discriminant analysis is not appropriate for these data.  A quadratic discriminant analysis is necessary.

Example 10-7: Swiss Bank notes Section

Let us consider a bank note with the following measurements:

Variable
Measurement
Length
214.9
Left Width
130.1
Right Width
129.9
Bottom Margin
9.0
Top Margin
10.6
Diagonal
140.5

Any number of lines of measurements may be considered. Here we are just interested in one set of measurements. It is requested that this bank note be classified as real or genuine. The posterior probability that it is fake or counterfeit is only 0.000002526. So, the posterior probability that it is genuine is very close to one (actually, this posterior probability is 1 - 0.000002526 = 0.999997474). We are nearly 100% confident that this is a real note and not counterfeit.

Next consider the results of cross-validation.

Note! Cross-validation yields estimates of the probability that a randomly selected note is correctly classified.

The resulting confusion table is as follows:

Classified As
Truth Counterfeit Genuine Total
Counterfeit
98
2
100
Genuine
1
99
100
Total
99
101
200

Here, we can see that 98 out of 100 counterfeit notes are expected to be correctly classified, while 99 out of 100 genuine notes are expected to be correctly classified.Thus, the estimated misclassification probabilities are estimated to be:

\(\hat{p}(\text{real | fake}) = 0.02 \) and \(\hat{p}(\text{fake | real}) = 0.01 \)

The question remains: Are these acceptable misclassification rates?

A decision should be made in advance as to what would be the acceptable levels of error. Here again, you need to think about the consequences of making a mistake. In terms of classifying a genuine note as a counterfeit, one might put an innocent person in jail. If you make the opposite error you might let a criminal go free. What are the costs of these types of errors? And, are the above error rates acceptable? This decision should be made in advance. You should have some prior notion of what you would consider reasonable.