# 7.7 - Examples Using Statistical Software Applications

For the following examples use either of the following datasets: Course_Survey1.MTW or Course_Survey1.XLS

About the data set: Last spring, all students registered for STAT200 were asked to complete a survey. A total of 1004 students responded. If we assume that this sample represents the PSU-UP undergraduate population, then we can make inferences regarding this population based on the survey results.

**Question 1:** If you had enough money for the rest of your life, would you still get a college education?

Hypotheses Statements: What would be the correct hypothesis to determine if there is a difference in gender among PSU-UP undergraduate students between the true percentages who would continue to get a college education even if they had enough money?

*H*_{o }: *p _{1}* −

*p*= 0 and

_{2}*H*

_{a}:

*p*−

_{1}*p*≠ 0

_{2}To perform a two proportion hypothesis test in Minitab:

- Open Minitab data set
- Go to Stat > Basic Stat > 2- proportions
- Click the radio button for Samples in One Column (this is the default)
- Click the text box for Samples (cursor should be in this box)
- Select from the variables list the variable GetEduc (be sure the variable GetEduc appears in the text box)
- Enter Gender in the text box for Subscripts
- Click Options and select the correct Alternative (e.g. not equal to); enter the correct Test Proportion value (default is 0.0 which is correct for this example); if using Version 15 in Minitab check the box for Used Pooled Estimate of p - remember this is only available in Version 15 or higher and is used when the test proportion is 0.
- Click OK twice

To perform a two proportion hypothesis test in SPSS:

- Import data into SPSS
- Go to Analyze > Descriptive Statistics > Crosstabs
- Enter Gender into the Row(s) window
- Enter GetEduc into the Column(s) window
- Click the button for Statistics and select the box for Chi Square
- Click Continue
- Click OK

(**Note**: SPSS treats two proportions as a special case of the general contingency table, a lesson we will discuss later. For our purposes use the p-value in the output for Pearson Chi-Square found under Chi-Square Tests. Also, the Z-test statistic used for testing two proportions can be found by taking the square root of this Chi-square test statistic. For instance, in the following example the Chi-square test statistic is 40.3 meaning the Z-test statistic would be 6.35. To determine if this would be negative or positive - remember that the square root can be positive or negative, e.g. the square root of 4 is ± 2 - simply consider how the difference in proportions is computed. If the difference in sample proportions is positive, then take the positive square root; if this difference is negative then use the negative square root.)

This should result in the following output

**Conclusion and Decision**: Since the* p*-value is less than 0.05 (we do not actually state that the *p*-value is 0) we would reject the null hypothesis and conclude that there is statistical evidence that a difference exists between the true proportion of female students who would continue to get a college education even if they had enough money for the rest of their life and the true proportion of males who would do so.

**Confidence Interval interpretation**: Among PSU-UP undergraduate students, we are 95% confident that the true difference in the percentage of females compared to males who would still get a college education even if that had enough money for the rest of their life is between 8.5% inches and 16.6%.

About the output:

- Difference is given as
*p*(female) −*p*(male) indicating that the difference is calculated by taking Female minus Male. If you wanted the reverse, then in Minitab you would have to recode the Gender variable to 0 and 1 where the 1 would represent Female. - The value of 0.125848 found in the Estimate for Difference would be the sample statistic used to build the confidence interval. That is, we would take this value and then add and subtract from it the margin of error.
- Since the confidence interval results in an interval that contains all positive values (i.e. the interval does not contain 0) and we took Female minus Male, we would conclude Female PSU−UP undergraduates are more likely than their male counterparts to get a college education even if they had enough money for the rest of their life.
- The Event = Yes indicates that the Yes response was the "success" of interest. If No was of concern, then we would have to recode these responses to 0 and 1 where 1 would represent No.
- The
*Z*−value of 6.35 is the test statistic one would use to find the*p*-value. This test statistic is found by taking the sample statistic (i.e. the estimate difference) minus the hypothesize value of 0 (see that Test for Difference equals 0) and dividing by the standard error.

**Question 2:** What are the measurements of the span of your right and left hand?

Hypotheses Statements: What would be the correct hypothesis to determine if mean right hand spans differ than mean left hand spans among PSU-UP undergraduate students?

*H*_{0} : μ_{d} = 0_{ }and *H*_{a} : μ_{d} ≠ 0

To perform a matched pairs hypothesis test in Minitab:

- Open Minitab data set
- Go to Stat > Basic Stat > Paired-t
- Click the radio button for Samples in Column (this is the default)
- Click the text box for First Sample (cursor should be in this box)
- Select from the variables list the variable Rspan (be sure the variable Rspan appears in the text box) and then in Second Sample enter the variable Lspan
- Click Options and select the correct Alternative (e.g. not equal to); enter the correct Test Mean value (default is 0.0 which is correct for this example
- Click OK twice

This should result in the following output:

To perform a matched pairs hypothesis test in SPSS:

- Import data into SPSS
- Go to Analyze > Compare Means > Paired Samples T Test
- Select Rspan and click the arrow. This should move Rspan into the text box under Variable1
- Select Lspan and click the arrow. This should move Lspan into the text box under Variable
- (If you want to change the default confidence level of 95% click Options and enter the desired confidence level)
- Click OK

**Special Note**: SPSS performs all tests as two-sided. If interested in a 1-sided alternative (e.g. "greater than") we would have to divide the p-value in half. Also, SPSS only considers 0 as the hypothesized test difference.

This should result in the following output:

**Conclusion and Decision**: Since the p-value of 0.068 is greater than 0.05 we would not reject the null hypothesis. We do not have enough statistical evidence to say that, on average, the true mean length of right hand spans differs from the true mean length of left hand spans for PSU−UP undergrads.

**NOTE: This was a two-sided test since we used "not equal" in the alternative hypothesis (also see that minitab says "not = 0"). If the research interest was to show that on average right-hand spans were longer than left-hand spans then our new Ha would use > and we need to divide this p-value by 2. In then next example we show how to use minitab to conduct such one-sided hypothesis tests.**

**Confidence Interval interpretation**: We are 95% confident that the true mean difference between right hand spans and left hand spans for PSU−UP undergraduate students is between −0.0035 inches and 0.0998 inches.

About the output:

- Based on the text Paired T for Rspan − Lspan the difference is found by taking the right hand spans minus the left hand spans.
- The value of the Mean found in the row named Difference would be the sample statistic used to build the confidence interval. That is, we would take this value and then add and subtract from it the margin of error.
- the value in the Difference row under SE Mean is the Standard Error of the Mean and is calculated by taking the Standard Deviation found in that Difference row (0.798840) and dividing by the square root of the number of differences.
- Since the confidence interval results in an interval that contains zero, we would conclude that no difference exists between the means. This result should concur with our hypothesis result as long as the alpha value used for the test corresponds to the level of confidence (i.e. alpha of 0.05 corresponds to a 95% level of confidence; alpha of 0.10 would correspond to a 90% level of confidence).
- The T−value of 1.83 is the test statistic one would use to find the
*p*-value. This test statistic is found by taking the sample statistic (i.e. the estimate difference) − the hypothesize value of 0 (see that Test of Difference equals 0) and dividing by the standard error (SE Mean of 0.026308).

**Question 3:** Some educational research indicates that students from households where the parents are divorced do not perform as well in school as those students whose parents are married. One measure of performance is GPA. Using GPA and comparing the means between students from divorced or married parents, what conclusions can be drawn? Assume that Divorce is population μ_{1 }and Married is μ_{2}.

*H*_{0} : μ_{1} - μ_{2} = 0_{ }and *H*_{a} : μ_{1} - μ_{2 }< 0

To perform a two sample mean hypothesis test in Minitab:

- Open Minitab data set
- Go to Stat > Basic Stat > 2-Sample t
- Click the radio button for Samples in One Column (this is the default)
- Click the text box for Samples (cursor should be in this box)
- Select from the variables list the variable GPA (be sure the variable GPA appears in the text box)
- Enter Parents in the text box for Subscripts
- Click check box for Assume Equal Variances (we can verify this with the output)
- Click Options and select the correct Alternative (less than for this example); enter the correct Test Difference value (default is 0.0 which is correct for this example);
- Click OK twice

This should result in the following output:

To perform a two sample mean hypothesis test in SPSS:

- Import data into SPSS
- Go to Analyze > Compare Means > Independent Samples T Test
- Select GPA and move variable into Test Variable(s) window.
- Select Parents and move variable into Grouping Variable text box
- Click Define Groups and type Divorced for Group1 and Married for Group2. NOTE: Spelling and capitalization must be identical to how the responses are presented in the data.
- Click Continue
- (If you want to change the default confidence level of 95% click Options and enter the desired confidence level)
- Click OK
- When interpreting the output, the p-value of interest for the test is the value under Sig. (2-tailed) which is the p-value for the t-test of Equality of Means.

**Special Note**: SPSS performs all tests as two-sided. If interested in a 1-sided alternative (e.g. "greater than") we would have to divide the p-value in half. Also, SPSS only considers 0 as the hypothesized test difference.

This should result in the following output:

**Conclusion and Decision**: Since the *p*-value is 0.001 which is less than 0.05 we would reject the null hypothesis and conclude that there is statistical evidence that on average, students at PSU-UP whose parents are divorced have lower GPA than students whose parents are married.

**Confidence Interval interpretation**: We are 95% confident that for PSU−UP the true mean difference in GPA between students whose parents are divorced to 0.0553 less than those students whose parents are married.

About the output:

- Difference is given as μ(No) − μ(Yes) indicating that the difference is calculated by taking Divorced minus Married for those responding to their parents marital status.
- The value of −0.1159 found in the Estimate for Difference would be the sample statistic used to build the confidence interval. That is, we would take this value and then
**only add**the margin of error. We only add since we are conducting a one−sided test of hypothesis and this side is for less than. The 95% upper bound provides the upper limit to our confidence interval and combining our alternative hypothesis implies that our estimated true mean difference is no greater (i.e. the true mean difference is 0.0553 days or more. If this seems confusing consider if you reversed the order subtracting Married from Divorced. The results would be the same except the bound would be positive and would represent the lower bound. The interpretation then might seem more clear). - The T−value of −3.15 is the test statistic one would use to find the
*p*-value. This test statistic is found by taking the sample statistic (i.e. the estimate difference) − the hypothesized value of 0 (see that Test of Difference equals 0) and dividing by the standard error. - The Both use Pooled StDev = 0.4606 indicates that the pooled variance assumption was used which makes sense since the ratio between the two standard deviations, 0.502 and 0.450, is not greater than 1.414 (or if we squared these SD to get the variance the larger would not be twice the smaller).