Example 7-15: Swiss Banknotes Section
An example of the calculation of simultaneous confidence intervals using the Swiss Banknotes data is given in the expression below:
\(\bar{x}_{1k}-\bar{x}_{2k} \pm \sqrt{\frac{p(n_1+n_2-2)}{n_1+n_2-p-1}F_{p,n_1+n_2-p-1,\alpha}}\sqrt{\left(\frac{1}{n_1}+\frac{1}{n_2}\right) s^2_k}\)
Here we note that the sample sizes are both equal to 100, \(n = n_{1} = n_{2} =100\), so there is going to be a simplification of our formula inside the radicals as shown above.
Carrying out the math for the variable Length, we end up with an interval that runs from -0.044 to 0.336 as shown below.
The SAS program, below, can be used to compute the simultaneous confidence intervals for the 6 variables.
Download the SAS program here: swiss11.sas
Note: In the upper right-hand corner of the code block you will have the option of copying ( ) the code to your clipboard or downloading ( ) the file to your computer.
options ls=78;
title "Confidence Intervals - Swiss Bank Notes";
/* %let allows the p variable to be used throughout the code below
*/
%let p=6;
data swiss;
infile "D:\Statistics\STAT 505\data\swiss3.csv" firstobs=2 delimiter=','
input type $ length left right bottom top diag;
run;
/* A new data set named 'real' is created, consisting
* of only the real notes. This is used for calculation
* of the statistics needed for the last step.
* Also, where each variable is originally in its own column,
* these commands stack the data so that all variable names
* are in one column called 'variable', and all response values
* are in another column called 'x'.
*/
data real;
set swiss;
if type="real";
variable="length"; x=length; output;
variable="left"; x=left; output;
variable="right"; x=right; output;
variable="bottom"; x=bottom; output;
variable="top"; x=top; output;
variable="diagonal"; x=diag; output;
keep type variable x;
run;
proc sort;
by variable;
run;
/* The means procedure calculates and saves the sample size,
* mean, and variance for each variable. It then saves these results
* in a new data set 'pop1', corresponding to the real notes.
* /
proc means data=real noprint;
by variable;
id type;
var x;
output out=pop1 n=n1 mean=xbar1 var=s21;
/* A new data set named 'fake' is created, consisting
* of only the fake notes. This is used for calculation
* of the statistics needed for the last step.
* Also, where each variable is originally in its own column,
* these commands stack the data so that all variable names
* are in one column called 'variable', and all response values
* are in another column called 'x'.
*/
data fake;
set swiss;
if type="fake";
variable="length"; x=length; output;
variable="left"; x=left; output;
variable="right"; x=right; output;
variable="bottom"; x=bottom; output;
variable="top"; x=top; output;
variable="diagonal"; x=diag; output;
keep type variable x;
run;
proc sort;
by variable;
run;
/* The means procedure calculates and saves the sample size,
* mean, and variance for each variable. It then saves these results
* in a new data set 'pop2', corresponding to the fake notes.
* /
proc means data=fake noprint;
by variable;
id type;
var x;
output out=pop2 n=n2 mean=xbar2 var=s22;
/* This last step combines the two separate data sets to one
* and computes the 95% simultaneous confidence interval limits
* from the statistics calculated previously.
* The variances are pooled from both the real and the fake samples.
*/
data combine;
merge pop1 pop2;
by variable;
f=finv(0.95,&p,n1+n2-&p-1);
t=tinv(1-0.025/&p,n1+n2-2);
sp=((n1-1)*s21+(n2-1)*s22)/(n1+n2-2);
losim=xbar1-xbar2-sqrt(&p*(n1+n2-2)*f*(1/n1+1/n2)*sp/(n1+n2-&p-1));
upsim=xbar1-xbar2+sqrt(&p*(n1+n2-2)*f*(1/n1+1/n2)*sp/(n1+n2-&p-1));
lobon=xbar1-xbar2-t*sqrt((1/n1+1/n2)*sp);
upbon=xbar1-xbar2+t*sqrt((1/n1+1/n2)*sp);
run;
proc print data=combine;
run;
The downloadable results as listed here: swiss11.lst.
At this time Minitab does not support this procedure.
Analysis
Obs | Variable | type | _TYPE | _FREQ_ | n1 | xbar1 | s21 | n2 | xbar1 | s22 |
---|---|---|---|---|---|---|---|---|---|---|
1 | bottom | fake | 0 | 100 | 100 | 8.305 | 0.41321 | 100 | 10.530 | 1.28131 |
2 | diagon | fake | 0 | 100 | 100 | 141.517 | 0.19981 | 100 | 139.450 | 0.31121 |
3 | left | fake | 0 | 100 | 100 | 129.943 | 0.13258 | 100 | 130.300 | 0.06505 |
4 | length | fake | 0 | 100 | 100 | 214.969 | 0.15024 | 100 | 214.823 | 0.12401 |
5 | right | fake | 0 | 100 | 100 | 129.720 | 0.12626 | 100 | 130.193 | 0.08894 |
6 | top | fake | 0 | 100 | 100 | 10.168 | 0.42119 | 100 | 11.133 | 0.40446 |
Obs | f | t | sp | losim | upsim | lobon | upbon |
---|---|---|---|---|---|---|---|
1 | 2.14580 | 2.66503 | 0.84726 | -2.69809 | 1.75191 | -2.57192 | -1.97808 |
2 | 2.14580 | 2.66503 | 0.2551 | 1.80720 | 2.32680 | 1.87649 | 2.25751 |
3 | 2.14580 | 2.66503 | 0.09881 | -0.51857 | -0.19543 | -0.47547 | -0.23853 |
4 | 2.14580 | 2.66503 | 0.13713 | -0.04433 | 0.33633 | 0.00643 | 0.28557 |
5 | 2.14580 | 2.66503 | 0.10760 | -0.64160 | -0.30440 | -0.59663 | -0.34937 |
6 | 2.14580 | 2.66503 |
0.41282 |
-1.29523 | -0.63477 | -1.20716 | -0.72284 |
The bounds of the simultaneous confidence intervals are given in columns for losim and upsim. Those entries are copied into the table below:
Variable | 95% Confidence Interval |
---|---|
Length | -0.044, 0.336 |
Left Width | -0.519, -0.195 |
Right Width | -0.642, -0.304 |
Bottom Margin | -2.698, -1.752 |
Top Margin | -1.295, -0.635 |
Diagonal | 1.807, 2.327 |
You need to be careful where they appear in the table in the output.
When interpreting these intervals we need to see which intervals include 0, which ones fall entirely below 0, and which ones fall entirely above 0.
The first thing that we notice is that interval for length includes 0. This suggests that we can not distinguish between the lengths of counterfeit and genuine banknotes. The intervals for both width measurements fall below 0.
Since these intervals are being calculated by taking the genuine notes minus the counterfeit notes this would suggest that the counterfeit notes are larger on these variables and we can conclude that the left and right margins of the counterfeit notes are wider than the genuine notes.
Similarly, we can conclude that the top and bottom margins of the counterfeit are also too large. Note, however, that the interval for the diagonal measurements falls entirely above 0. This suggests that the diagonal measurements of the counterfeit notes are smaller than that of the genuine notes.
Conclusions
- Counterfeit notes are too wide on both the left and right margins.
- The top and bottom margins of the counterfeit notes are too large.
- The diagonal measurement of the counterfeit notes is smaller than that of the genuine notes.
- Cannot distinguish between the lengths of counterfeit and genuine banknotes.