7.2.3 - Example: Swiss Banknotes

7.2.3 - Example: Swiss Banknotes

Example 7-15: Swiss Banknotes

An example of the calculation of simultaneous confidence intervals using the Swiss Banknotes data is given in the expression below:

\(\bar{x}_{1k}-\bar{x}_{2k} \pm \sqrt{\frac{p(n_1+n_2-2)}{n_1+n_2-p-1}F_{p,n_1+n_2-p-1,\alpha}}\sqrt{\left(\frac{1}{n_1}+\frac{1}{n_2}\right) s^2_k}\)

Here we note that the sample sizes are both equal to 100, \(n = n_{1} = n_{2} =100\), so there is going to be a simplification of our formula inside the radicals as shown above.

Carrying out the math for the variable Length, we end up with an interval that runs from -0.044 to 0.336 as shown below.

The SAS program, below, can be used to compute the simultaneous confidence intervals for the 6 variables.

Download the SAS program here: swiss11.sas

 

Note: In the upper right-hand corner of the code block you will have the option of copying ( ) the code to your clipboard or downloading ( ) the file to your computer.

options ls=78;
title "Confidence Intervals - Swiss Bank Notes";

 /* %let allows the p variable to be used throughout the code below
  */

%let p=6;

data swiss;
  infile "D:\Statistics\STAT 505\data\swiss3.csv" firstobs=2 delimiter=','
  input type $ length left right bottom top diag;
  run;

 /* A new data set named 'real' is created, consisting 
  * of only the real notes. This is used for calculation 
  * of the statistics needed for the last step.
  * Also, where each variable is originally in its own column, 
  * these commands stack the data so that all variable names 
  * are in one column called 'variable', and all response values 
  * are in another column called 'x'.
  */

data real;
  set swiss;
  if type="real";
  variable="length";   x=length; output;
  variable="left";     x=left;   output;
  variable="right";    x=right;  output;
  variable="bottom";   x=bottom; output;
  variable="top";      x=top;    output;
  variable="diagonal"; x=diag;   output;
  keep type variable x;
  run;

proc sort;
  by variable;
  run;

 /* The means procedure calculates and saves the sample size,
  * mean, and variance for each variable. It then saves these results 
  * in a new data set 'pop1', corresponding to the real notes.
  * /

proc means data=real noprint;
  by variable;
  id type;
  var x;
  output out=pop1 n=n1 mean=xbar1 var=s21;

 /* A new data set named 'fake' is created, consisting 
  * of only the fake notes. This is used for calculation 
  * of the statistics needed for the last step.
  * Also, where each variable is originally in its own column, 
  * these commands stack the data so that all variable names 
  * are in one column called 'variable', and all response values 
  * are in another column called 'x'.
  */

data fake;
  set swiss;
  if type="fake";
  variable="length";   x=length; output;
  variable="left";     x=left;   output;
  variable="right";    x=right;  output;
  variable="bottom";   x=bottom; output;
  variable="top";      x=top;    output;
  variable="diagonal"; x=diag;   output;
  keep type variable x;
  run;

proc sort;
  by variable;
  run;

 /* The means procedure calculates and saves the sample size,
  * mean, and variance for each variable. It then saves these results 
  * in a new data set 'pop2', corresponding to the fake notes.
  * /

proc means data=fake noprint;
  by variable;
  id type;
  var x;
  output out=pop2 n=n2 mean=xbar2 var=s22;


 /* This last step combines the two separate data sets to one
  * and computes the 95% simultaneous confidence interval limits 
  * from the statistics calculated previously. 
  * The variances are pooled from both the real and the fake samples.
  */

data combine;
  merge pop1 pop2;
  by variable;
  f=finv(0.95,&p,n1+n2-&p-1);
  t=tinv(1-0.025/&p,n1+n2-2);
  sp=((n1-1)*s21+(n2-1)*s22)/(n1+n2-2);
  losim=xbar1-xbar2-sqrt(&p*(n1+n2-2)*f*(1/n1+1/n2)*sp/(n1+n2-&p-1));
  upsim=xbar1-xbar2+sqrt(&p*(n1+n2-2)*f*(1/n1+1/n2)*sp/(n1+n2-&p-1));
  lobon=xbar1-xbar2-t*sqrt((1/n1+1/n2)*sp);
  upbon=xbar1-xbar2+t*sqrt((1/n1+1/n2)*sp);
  run;

proc print data=combine;
  run;

The downloadable results as listed here: swiss11.lst.

At this time Minitab does not support this procedure.

Analysis

Confidence Intervals - Swiss Bank Notes

Obs Variable type _TYPE _FREQ_ n1 xbar1 s21 n2 xbar1 s22
1 bottom fake 0 100 100 8.305 0.41321 100 10.530 1.28131
2 diagon fake 0 100 100 141.517 0.19981 100 139.450 0.31121
3 left fake 0 100 100 129.943 0.13258 100 130.300 0.06505
4 length fake 0 100 100 214.969 0.15024 100 214.823 0.12401
5 right fake 0 100 100 129.720 0.12626 100 130.193 0.08894
6 top fake 0 100 100 10.168 0.42119 100 11.133 0.40446

Confidence Intervals - Swiss Bank Notes

Obs f t sp losim upsim lobon upbon
1 2.14580 2.66503 0.84726 -2.69809 1.75191 -2.57192 -1.97808
2 2.14580 2.66503 0.2551 1.80720 2.32680 1.87649 2.25751
3 2.14580 2.66503 0.09881 -0.51857 -0.19543 -0.47547 -0.23853
4 2.14580 2.66503 0.13713 -0.04433 0.33633 0.00643 0.28557
5 2.14580 2.66503 0.10760 -0.64160 -0.30440 -0.59663 -0.34937
6 2.14580 2.66503

0.41282

-1.29523 -0.63477 -1.20716 -0.72284
 

The bounds of the simultaneous confidence intervals are given in columns for losim and upsim. Those entries are copied into the table below:

Variable 95% Confidence Interval
Length -0.044, 0.336
Left Width -0.519, -0.195
Right Width -0.642, -0.304
Bottom Margin -2.698, -1.752
Top Margin -1.295, -0.635
Diagonal 1.807, 2.327

You need to be careful where they appear in the table in the output.

Note! The variables are now sorted in alphabetic order! For example, the length would be the fourth line of the output data. In any case, you should be able to find the numbers for the lower and upper bound of the simultaneous confidence intervals from the SAS output and see where they appear in the table above. The interval for length, for example, can then be seen to be -0.044 to 0.336 as was obtained from the hand calculations previously.

When interpreting these intervals we need to see which intervals include 0, which ones fall entirely below 0, and which ones fall entirely above 0.

The first thing that we notice is that interval for length includes 0. This suggests that we can not distinguish between the lengths of counterfeit and genuine banknotes. The intervals for both width measurements fall below 0.

Since these intervals are being calculated by taking the genuine notes minus the counterfeit notes this would suggest that the counterfeit notes are larger on these variables and we can conclude that the left and right margins of the counterfeit notes are wider than the genuine notes.

Similarly, we can conclude that the top and bottom margins of the counterfeit are also too large. Note, however, that the interval for the diagonal measurements falls entirely above 0. This suggests that the diagonal measurements of the counterfeit notes are smaller than that of the genuine notes.

Conclusions

  • Counterfeit notes are too wide on both the left and right margins.
  • The top and bottom margins of the counterfeit notes are too large.
  • The diagonal measurement of the counterfeit notes is smaller than that of the genuine notes.
  • Cannot distinguish between the lengths of counterfeit and genuine banknotes.

Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility