7.2.3 - Example: Swiss Banknotes

Example 7-15: Swiss Banknotes Section

An example of the calculation of simultaneous confidence intervals using the Swiss Banknotes data is given in the expression below:

\(\bar{x}_{1k}-\bar{x}_{2k} \pm \sqrt{\frac{p(n_1+n_2-2)}{n_1+n_2-p-1}F_{p,n_1+n_2-p-1,\alpha}}\sqrt{\left(\frac{1}{n_1}+\frac{1}{n_2}\right) s^2_k}\)

Here we note that the sample sizes are both equal to 100, \(n = n_{1} = n_{2} =100\), so there is going to be a simplification of our formula inside the radicals as shown above.

Carrying out the math for the variable Length, we end up with an interval that runs from -0.044 to 0.336 as shown below.

Example
Example

The SAS program, below, can be used to compute the simultaneous confidence intervals for the 6 variables.

Download the SAS program here: swiss11.sas

Note: In the upper right-hand corner of the code block you will have the option of copying () the code to your clipboard or downloading () the file to your computer.

options ls=78;
title "Confidence Intervals - Swiss Bank Notes";

 /* %let allows the p variable to be used throughout the code below
  */

%let p=6;

data swiss;
  infile "D:\Statistics\STAT 505\data\swiss3.csv" firstobs=2 delimiter=','
  input type $ length left right bottom top diag;
  run;

 /* A new data set named 'real' is created, consisting 
  * of only the real notes. This is used for calculation 
  * of the statistics needed for the last step.
  * Also, where each variable is originally in its own column, 
  * these commands stack the data so that all variable names 
  * are in one column called 'variable', and all response values 
  * are in another column called 'x'.
  */

data real;
  set swiss;
  if type="real";
  variable="length";   x=length; output;
  variable="left";     x=left;   output;
  variable="right";    x=right;  output;
  variable="bottom";   x=bottom; output;
  variable="top";      x=top;    output;
  variable="diagonal"; x=diag;   output;
  keep type variable x;
  run;

proc sort;
  by variable;
  run;

 /* The means procedure calculates and saves the sample size,
  * mean, and variance for each variable. It then saves these results 
  * in a new data set 'pop1', corresponding to the real notes.
  */

proc means data=real noprint;
  by variable;
  id type;
  var x;
  output out=pop1 n=n1 mean=xbar1 var=s21;

 /* A new data set named 'fake' is created, consisting 
  * of only the fake notes. This is used for calculation 
  * of the statistics needed for the last step.
  * Also, where each variable is originally in its own column, 
  * these commands stack the data so that all variable names 
  * are in one column called 'variable', and all response values 
  * are in another column called 'x'.
  */

data fake;
  set swiss;
  if type="fake";
  variable="length";   x=length; output;
  variable="left";     x=left;   output;
  variable="right";    x=right;  output;
  variable="bottom";   x=bottom; output;
  variable="top";      x=top;    output;
  variable="diagonal"; x=diag;   output;
  keep type variable x;
  run;

proc sort;
  by variable;
  run;

 /* The means procedure calculates and saves the sample size,
  * mean, and variance for each variable. It then saves these results 
  * in a new data set 'pop2', corresponding to the fake notes.
  */

proc means data=fake noprint;
  by variable;
  id type;
  var x;
  output out=pop2 n=n2 mean=xbar2 var=s22;


 /* This last step combines the two separate data sets to one
  * and computes the 95% simultaneous confidence interval limits 
  * from the statistics calculated previously. 
  * The variances are pooled from both the real and the fake samples.
  */

data combine;
  merge pop1 pop2;
  by variable;
  f=finv(0.95,&p,n1+n2-&p-1);
  t=tinv(1-0.025/&p,n1+n2-2);
  sp=((n1-1)*s21+(n2-1)*s22)/(n1+n2-2);
  losim=xbar1-xbar2-sqrt(&p*(n1+n2-2)*f*(1/n1+1/n2)*sp/(n1+n2-&p-1));
  upsim=xbar1-xbar2+sqrt(&p*(n1+n2-2)*f*(1/n1+1/n2)*sp/(n1+n2-&p-1));
  lobon=xbar1-xbar2-t*sqrt((1/n1+1/n2)*sp);
  upbon=xbar1-xbar2+t*sqrt((1/n1+1/n2)*sp);
  run;

proc print data=combine;
  run;

The downloadable results as listed here: swiss11.lst.

At this time Minitab does not support this procedure.

Analysis

Confidence Intervals - Swiss Bank Notes
Obs	Variable	type	_FREQ_	n1	xbar1	s21	n2	xbar1	s22
1	bottom	fake	100	100	8.305	0.41321	100	10.530	1.28131
2	diagon	fake	100	100	141.517	0.19981	100	139.450	0.31121
3	left	fake	100	100	129.943	0.13258	100	130.300	0.06505
4	length	fake	100	100	214.969	0.15024	100	214.823	0.12401
5	right	fake	100	100	129.720	0.12626	100	130.193	0.08894
6	top	fake	100	100	10.168	0.42119	100	11.133	0.40446

Confidence Intervals - Swiss Bank Notes
Obs	f	t	sp	losim	upsim	lobon	upbon
1	2.14580	2.66503	0.84726	-2.69809	1.75191	-2.57192	-1.97808
2	2.14580	2.66503	0.2551	1.80720	2.32680	1.87649	2.25751
3	2.14580	2.66503	0.09881	-0.51857	-0.19543	-0.47547	-0.23853
4	2.14580	2.66503	0.13713	-0.04433	0.33633	0.00643	0.28557
5	2.14580	2.66503	0.10760	-0.64160	-0.30440	-0.59663	-0.34937
6	2.14580	2.66503	0.41282	-1.29523	-0.63477	-1.20716	-0.72284

The bounds of the simultaneous confidence intervals are given in columns for losim and upsim. Those entries are copied into the table below:

Variable	95% Confidence Interval
Length	-0.044, 0.336
Left Width	-0.519, -0.195
Right Width	-0.642, -0.304
Bottom Margin	-2.698, -1.752
Top Margin	-1.295, -0.635
Diagonal	1.807, 2.327

You need to be careful where they appear in the table in the output.

Note! The variables are now sorted in alphabetic order! For example, the length would be the fourth line of the output data. In any case, you should be able to find the numbers for the lower and upper bound of the simultaneous confidence intervals from the SAS output and see where they appear in the table above. The interval for length, for example, can then be seen to be -0.044 to 0.336 as was obtained from the hand calculations previously.

When interpreting these intervals we need to see which intervals include 0, which ones fall entirely below 0, and which ones fall entirely above 0.

The first thing that we notice is that interval for length includes 0. This suggests that we can not distinguish between the lengths of counterfeit and genuine banknotes. The intervals for both width measurements fall below 0.

Since these intervals are being calculated by taking the genuine notes minus the counterfeit notes this would suggest that the counterfeit notes are larger on these variables and we can conclude that the left and right margins of the counterfeit notes are wider than the genuine notes.

Similarly, we can conclude that the top and bottom margins of the counterfeit are also too large. Note, however, that the interval for the diagonal measurements falls entirely above 0. This suggests that the diagonal measurements of the counterfeit notes are smaller than that of the genuine notes.

Conclusions

Counterfeit notes are too wide on both the left and right margins.
The top and bottom margins of the counterfeit notes are too large.
The diagonal measurement of the counterfeit notes is smaller than that of the genuine notes.
Cannot distinguish between the lengths of counterfeit and genuine banknotes.