Lesson 12: Summary and Review
Lesson 12: Summary and ReviewOverview
This lesson is a culmination of STAT 500. A review of all the statistical techniques is provided, as well as table consisting of inferences, parameters, statistics, types of data, examples, analysis, and conditions.
Objectives
- Review the statistical techniques covered in STAT 500.
- Given a real-world application choose the correct statistical technique.
12.1 - Summary of Statistical Techniques
12.1 - Summary of Statistical TechniquesTabbed Flow Charts
Summary Table for Statistical Techniques
Estimate a Value
- Estimating a Mean
- Estimating a Proportion
- Estimating the difference of two means
- Estimating a mean with paired data
- Estimating the difference of two proportions
Test a hypothesis
- Test about a mean
- Test about a proportion
- Test to compare two means (independent)
- Test to compare two means (paired)
- Test to compare two proportions
- Test about a slope
- Test to compare several means
- Test of Strength & Direction of Linear Relationship of 2 Quantitative Variables
- Test to Compare Two Population Variances
Examine a Relationship
Estimating a Mean
Parameter
One opulation mean, \(\mu\)
Statistic
Sample mean, \(\bar{x}\)
Type of Data
Numerical
Analysis
1-sample t-interval
\(\bar{x}\pm t_{\alpha /2}\cdot \frac{s}{\sqrt{n}}\)
Minitab Command
Stat > Basic statistics > 1-sample t
Conditions
data approximately normal OR
have a large sample size (n ≥ 30)
Examples
- What is the average weight of adults?
- What is the average cholesterol level of adult females?
Test About a Mean
Parameter
One population mean, \(\mu\)
Statistic
Sample mean, \(\bar{x}\)
Type of Data
Numerical
Analysis
\(H_0\colon \mu = \mu_0\)
\(H_a\colon \mu \ne \mu_0\) OR
\(H_a\colon \mu > \mu_0\) OR
\(H_a\colon \mu < \mu_0\)
1-sample t-test:
\(t=\frac{\bar{x}-\mu_{0}}{\frac{s}{\sqrt{n}}}\)
Minitab Command
Stat > Basic statistics > 1-sample t
Conditions
data approximately normal
OR
have a large sample size (n ≥ 30)
Examples
- Is the average GPA of juniors at Penn State higher than 3.0?
- Is the average winter temperature in State College less than 42°F?
Estimating a Proportion
Parameter
One population proportion \(p\)
Statistic
Sample proportion, \(\hat{p}\)
Type of Data
Categorical (Binary)
Analysis
1-proportion Z-interval:
\( \hat{p}\pm z_{\alpha /2}\sqrt{\frac{\hat{p}\cdot \left ( 1-\hat{p} \right )}{n}}\)
Minitab Command
Stat > Basic statistics > 1-sample proportion
Conditions
have at least 5 in each categoryExamples
- What is the proportion of males in the world?
- What is the proportion of students that smoke?
Test about a Proportion
Parameter
One population proportion, \(p\)
Statistic
Sample proportion, \(\hat{p}\)
Type of Data
Categorical (Binary)
Analysis
\(H_0\colon p = p_0\)
\(H_a\colon p \ne p_0\) OR
\(H_a\colon p > p_0\) OR
\(H_a\colon p < p_0\)
1-proportion Z-test:
\(z=\frac{\hat{p}-p _{0}}{\sqrt{\frac{p _{0}\left ( 1- p _{0}\right )}{n}}}\)
Minitab Command
Stat > Basic statistics > 1-sample proportion
Conditions
\(np_0 \geq 5\) and
\(n (1 - p_0) \geq 5\)
Examples
- Is the proportion of females different from 0.5?
- Is the proportion of students who fail STAT 500 less than 0.1?
Estimating the Difference of Two Means*
Parameter
Difference in two population means,
\(\mu_1 - \mu_2\)
Statistic
Difference in two sample means,
\(\bar{x}_{1} - \bar{x}_{2}\)
Type of Data
Numerical
Analysis
2-sample t-interval:
\(\bar{x}_{1}-\bar{x}_{2}\pm t_{\alpha /2}\cdot \\\hat{s.e.}\left (\bar{x}_{1}-\bar{x}_{2} \right )\)
Minitab Command
Stat > Basic statistics > 2-sample t
Conditions
Independent samples from the two populations
Data in each sample are about normal or large samples
Examples
- How different are the mean GPAs of males and females?
- How many fewer colds do vitamin C takers get, on average, than non-vitamin takers?
Test to Compare Two Means*
Parameter
Difference in two population means,
\(\mu_1 - \mu_2\)
Statistic
Difference in two sample means,
\(\bar{x}_{1} - \bar{x}_{2}\)
Type of Data
Numerical
Analysis
\(H_0\colon \mu_1 = \mu_2\) \(H_a\colon \mu_1 \ne \mu_2\) OR
\(H_a\colon \mu_1 > \mu_2\) OR
\(H_a\colon \mu_1 < \mu_2\)
2-sample t-test: \(t=\frac{\left (\bar{x}_{1}-\bar{x}_{2} \right )-0}{\hat{s.e.}\left (\bar{x}_{1}-\bar{x}_{2} \right )} \)
Minitab Command
Stat > Basic statistics > 2-sample t
Conditions
Independent samples from the two populations
Data in each sample are about normal or large samples
Examples
- Do the mean pulse rates of exercisers and non-exercisers differ?
- Is the mean EDS score for dropouts greater than the mean EDS score for graduates?
*(The Standard Error (S.E.) will depend on pooled vs unpooled)
Estimating a Mean with Paired Data
Parameter
Mean of paired difference,
\(\mu_D\)
Statistic
Sample mean of difference,
\(\bar{d}\)
Type of Data
Numerical
Analysis
paired t-interval:
\(\bar{d}\pm t_{\alpha /2}\cdot \frac{s_{d}}{\sqrt{n}}\)
Minitab Command
Stat > Basic statistics > Paired t
Conditions
Differences approximately normal OR
Have a large number of pairs (n ≥ 30)
Examples
- What is the difference in pulse rates, on the average, before and after exercise?
Test about a Mean with Paired Data
Parameter
Mean of paired difference,
\(\mu_D\)
Statistic
Sample mean of difference,
\(\bar{d}\)
Type of Data
Numerical
Analysis
\(H_0\colon \mu_D = 0\)
\(H_a\colon \mu_D \ne 0\) OR
\(H_a\colon \mu_D > 0\) OR
\(H_a\colon \mu_D < 0\)
t-test statistic:
\(t=\frac{\bar{d}-0}{\frac{s_d}{\sqrt{n}}}\)
Minitab Command
Stat > Basic statistics > Paired t
Conditions
Differences approximately normal OR
Have a large number of pairs (n ≥ 30)
Examples
- Is the difference in IQ of pairs of twins zero?
- Are the pulse rates of people higher after exercise?
Estimating the Difference of Two Proportions
Parameter
Difference in two population proportions,
\(p_1 - p_2\)
Statistic
Difference in two sample proportions,
\(\hat{p}_{1} - \hat{p}_{2}\)
Type of Data
Categorical (Binary)
Analysis
2-proportions Z-interval:
\(\hat{p} _{1}-\hat{p} _{2}\pm z_{\alpha /2}\cdot\\ \hat{s.e.}\left ( \hat{p} _{1}-\hat{p} _{2} \right )\)
Minitab Command
Stat > Basic statistics > 2 proportions
Conditions
Independent samples from the two populations
Have at least 5 in each category for both populations
Examples
- How different are the percentages of male and female smokers?
- How different are the percentages of upper- and lower-class binge drinkers?
Test to Compare Two Proportions
Parameter
Difference in two population proportions,
\(p_1 - p_2\)
Statistic
Difference in two sample proportions,
\(\hat{p}_{1} - \hat{p}_{2}\)
Type of Data
Categorical (Binary)
Analysis
\(H_0\colon p_1 = p_2\)
\(H_a\colon p_1 \ne p_2 \) OR
\(H_a\colon p_1 > p_2\) OR
\(H_a\colon p_1 < p_2\)
2-proportion Z-test:
\(z^*=\frac{\hat{p}_{1}-\hat{p}_{2}}{\sqrt{\hat{p}^*\left ( 1-\hat{p}^* \right )\left ( \frac{1}{n_{1}}+ \frac{1}{n_{2}}\right )}}\)
\(\hat{p}^*=\dfrac{x_{1}+x_{2}}{n_{1}+n_{2}}\)
Minitab Command
Stat > Basic statistics > 2 proportions
Conditions
Independent samples from the two populations
Have at least 5 in each category for both populations
Examples
-
Is the percentage of males with lung cancer higher than the percentage of females with lung cancer?
-
Are the percentages of upper- and lower- class binge drinkers different?
Relationship in a 2-Way Table
Parameter
Relationship between two categorical variables, OR
difference in two or more population proportions
Statistic
The observed counts in a two-way table
Type of Data
Categorical
Analysis
\(H_0\colon\text{The two variables are not related}\)
\(H_a\colon\text{The two variables are related}\)
Chi-square test statistic:
\(X^2=\sum_{\text{all cells}}\frac{(\text{Observed-Expected})^2}{\text{Expected}}\)
Minitab Command
Stat > Tables > Chi square Test for Association
Conditions
All expected counts should be greater than 1
At least 80% of the cells should have an expected count greater than 5
Examples
- Is there a relationship between smoking and lung cancer?
- Do the proportions of students in each class who smoke differ?
Test About a Slope
Parameter
Slope of the population regression line,
\(\beta_1\)
Statistic
Sample estimate of the slope,
\(b_1\)
Type of Data
Numerical
Analysis
\(H_0\colon \beta_1 = 0\)
\(H_a\colon \beta_1 \ne 0\) OR
\(H_a\colon \beta_1 > 0\) OR
\(H_a\colon \beta_1 < 0\)
t-test with n - 2 degrees of freedom:
\(t=\dfrac{b_{1}-0}{\hat{s.e.}\left ( b_{1} \right )}\)
Minitab Command
Stat > Regression > Regression
Conditions
The form of the equation that links the two variables must be correct
The error terms are normally distributed
The errors terms have equal variances
The error terms are independent of each other
Examples
-
Is there a linear relationship between height and weight of a person?
Test to Compare Several Means
Parameter
Population means of the t populations,
\(\mu_1, \mu_2, \cdots , \mu_t\)
Statistic
Sample means of the t populations,
\(x_1, x_2, \cdots , x_t\)
Type of Data
Numerical
Analysis
\(H_0\colon \mu_1 = \mu_2 = ... = \mu_t\)
\(H_a\colon \text{not all the means are equal}\)
F-test for one-way ANOVA:
\(F=\dfrac{MST}{MSE}\)
Minitab Command
Stat > ANOVA > Oneway
Conditions
Each population is normally distributed
Independent samples from the t populations
Equal population standard deviations
Examples
-
Is there a difference between the mean GPA of freshman, sophomore, junior, and senior classes?
Test of Strength & Direction of Linear Relationship of 2 Quantitative Variables
Parameter
Population correlation,
\(\rho\)
"rho"
Statistic
Sample correlation,
\(r\)
Type of Data
Numerical
Analysis
\(H_0\colon \rho = 0\)
\(H_a\colon \rho \ne 0\)
t-test statistic:
\(t=\frac{r\sqrt{n-2}}{\sqrt{1-r^2}}\)
Minitab Command
Stat > Basic Statistics > Correlation
Conditions
2 variables are continuous
Related pairs
No significant outliers
Normality of both variables
Linear relationship between the variables
Examples
-
Is there a linear relationship between height and weight?
Test to Compare Two Population Variances
Parameter
Population variances of two populations,
\(\sigma_{1}^{2}, \sigma_{2}^{2}\)
Statistic
Sample variances of two populations,
\(s_{1}^{2}, s_{2}^{2}\)
Type of Data
Numerical
Analysis
\(H_0\colon \sigma_{1}^{2} = \sigma_{2}^{2}\)
\(H_2\colon \sigma_{1}^{2} \ne \sigma_{2}^{2}\)
F-test statistic:
\(F=\frac{s_{1}^{2}}{s_{2}^{2}}\)
Minitab Command
Stat > Basic statistics > 2 variances
Conditions
Each population is normally distributed
Independent samples from the 2 populations
Examples
-
Are the variances of length of lumber produced by Company A different from those produced by Company B?
12.2 - Choose the Correct Statistical Technique
12.2 - Choose the Correct Statistical TechniqueList of Statistical Techniques
Estimate a Value
- Estimating a Mean
- Estimating a Proportion
- Estimating the difference of two means
- Estimating a mean with paired data
- Estimating the difference of two proportions
Test a hypothesis
- Test about a mean
- Test about a proportion
- Test to compare two means (independent)
- Test to compare two means (paired)
- Test to compare two proportions
- Test about a slope
- Test to compare several means
- Test of Strength & Direction of Linear Relationship of 2 Quantitative Variables
- Test to Compare Two Population Variances
Examine a Relationship
- Relationship in a 2-Way Table
Choose the Correct Statistical Technique
Directions
For the scenarios below, choose a method that is suitable for the given situation.
- A survey of National Federation of Independence Business (NFIB) indicates that small businesses intended to increase their hiring as well as their capital expenditures during 1986 as compared with 1985. Suppose that, as part of a follow-up survey by NFIB, 20 small businesses, randomly chosen from the NFIB's list of 2,100 companies, show an average hiring from 1985 equal to 3.2 new employees per firm and a standard deviation of 1.5 hires. A random sample of 30 small businesses taken at the end of 1986 shows an average of 5.1 new hires and a standard deviation of 2.3 hires. At the \(\alpha = 0.01\) level of significance, can you conclude that average hiring by all small businesses in 1986 increased as compared with 1985?
- It is known that the average stay of tourists in Hong Kong hotels has been 3.4 nights. A tourism industry analyst wanted to test whether recent changes in the nature of tourism to Hong Kong have changed from this past average. The analyst obtained the following random sample of the number of nights spent by tourists in Hong Kong hotels: 5, 4, 3, 2, 1, 1, 5, 7, 8, 4, 3, 3, 2, 5, 7, 1, 3, 1, 1, 5, 3, 4, 2, 2, 2, 6, 1, 7. Conduct the test using the 0.05 level of significance.
- There are 155 banks involved in certain international transactions. A federal agency claims that at least 35% of these banks have total assets of over \$10 billion (In U.S. dollars). An independent agency wants to test this claim. It gets a random sample of 50 out of the 155 banks and finds that 15 of them have total assets of over \$10 billion. Can the claim be rejected?
- General Motors Corporation hopes to reduce anticipated production costs of its Saturn Model by instituting an assembly schedule that will reduce average production time to about 40 hours per car. In a test run of the new assembly line, 40 cars are built at a sample average time per car of 46.5 hours and a sample standard deviation of 8.0 hours. A test run of 38 cars using the old assembly schedule results in a sample of mean of 51.2 hours and a sample deviation of 9.5 hours. Is there proof that the new assembly schedule reduces the average production time per car?
- A telephone company wants to estimate the average length of long-distance calls during weekends. A random sample of 50 calls gives a mean \(\bar{X} =14.5\) min and standard deviation s = 5.6 min. Provide an interval estimate for the average length of a long-distance phone call during weekends.
- Several companies have been developing electronic guidance systems for cars. Motorola and Germany's Blauounkt are two firms in the forefront of such research. Out of 120 trials of the Motorola model, 101 were successful; and out of 200 tests of the Blaupunkt model, 110 were successful. Is there evidence to conclude that the Motorola electronic guidance system is superior to the German competitor?
- An important measure of the risk associated with a stock is the standard deviation, or variance, of the stock's price movements. A financial analyst wants to test the one-tailed hypothesis that stock A has a greater risk (larger variance of price) than stock B. A random sample of 25 daily prices of stock A gives \(s_{A}^2= 6.52\), and a random sample of 22 daily prices of stock B gives a sample variance of \(s_{B}^2= 3.47\). Carry out the test at \(\alpha = 0.01\).
- A company is interested in offering its employees one of two employee benefit packages. A random sample of the company's employees is collected, and each person in the sample is asked to rate each of the two packages on an overall preference scale of 0 to 100. The order of presentation of each of the two plans is randomly selected for each person in the sample. The paired data are:
- Program A: 45 67 63 59 77 69 45 39 52 58 70 46 60 65 59 80
- Program B: 56 70 60 45 85 79 50 46 50 60 82 40 65 55 81 68
Determine whether the employees rate one package higher.
- Analysis of variance has long been used in providing evidence of the effectiveness of pharmaceutical drugs. Such evidence is required before the FDA will allow a drug to be marketed. In a recent test of the effectiveness of a new sleeping pill, three groups of 25 patients each were given the following treatments. One group was given the drug, the second group was given a placebo, and the third group was given no treatment at all. The results are as follows.
Drug group 12, 17, 34, 11, 5, 42, 18, 27, 2, 37, 50, 32, 12, 27, 21, 10, 4, 33, 63, 22, 41, 19, 28, 29, 8 Placebo group 44, 32, 28, 30, 22, 12, 3, 12, 42, 13, 27, 54, 56, 32, 37, 28, 22, 22, 24, 9, 20, 4, 13, 42, 67 No-treatment group 32, 33, 21, 12, 15, 14, 55, 67, 72, 1, 44, 60, 36, 38, 49, 66, 89, 63, 23, 6, 9, 56, 58, 39, 59 Determine whether or not the drug is effective.
- The maker of portable exercise equipment, designed for the health-conscious people who travel too frequently to use a regular athletic club, wants to estimate the proportion of traveling business people who may be interested in the product. A random sample of 120 traveling business people indicates that 28 of them may be interested in purchasing the portable fitness equipment. Provide an interval estimate for the proportion of all travelling business people who may be interested in the product.
- A study undertaken by Montgomery Securities to access average labor and materials costs incurred by Chrysler and General Motors in building a typical four-door, intermediate-sized car. The reported average cost for Chrysler was \$9500, and for GM it was \$9780. Suppose that these data are based on random samples of 25 cars for each company, and suppose that both standard deviations are equal to \$1500. Test the hypothesis that the average GM car of this type is more expensive to build than the average Chrysler car of the same type.
- Recent studies indicates that in order to be globally competitive, firms must form global strategic partnerships. An investment banker wants to test whether the return on investment for international ventures is different from return on investment for similar domestic ventures. A sample of 12 firms that recently entered into ventures with foreign companies is available. For each firm, the return on investment for both the international venture (I), and similar domestic venture (D) is given:
- D(%): 10 12 14 12 12 17 9 15 8.5 11 7 15
- I(%) : 11 14 15 11 12.5 16 10 13 10.5 17 9 19
Assuming that these firms represent a random sample from the population of all firms involved in global strategic partnerships, can the investment banker conclude that there are differences between average returns on domestic ventures and average returns on international ventures? Explain.
- When new paperback novels are promoted at bookstores, a display is often arranged with copies of the same book with differently colored covers. A publishing house wanted to find out whether there is a dependence between the place where the book is sold and the color of its cover. For one of its latest novels, the publisher sent displays and a supply of copies of the novels to large bookstores in five major cities. The resulting sales of the novel for each city-color combination are as follows. Numbers are in thousands of copies sold over a three-month period.
City Red Blue Green Yellow Total New York 21 27 40 15 103 Washington 14 18 28 8 68 Boston 11 13 21 7 52 Chicago 3 33 30 9 75 Los Angeles 30 11 34 10 84 Total 79 102 153 49 383 Assume that the data are random samples for each particular color-city combination and that the inference may apply to all novels. Are color and location related?
- Certain eggs are stated to have reduced cholesterol content, with an average of only 2.5% cholesterol. A concerned health group wants to test whether the claim is true. The group believes that more cholesterol may be found, on the average, in the eggs. A random sample of 100 eggs reveals a sample average content of 5.2% cholesterol, and a sample standard deviation of 2.8%. Does the health group have cause for action?
- Two 12-meter boats, the K boat and the L boat, are tested as possible contenders in the America's Cup races. The following data represent the time, in minutes, to complete a particular tack in independent random trials of the two boats.
- K boat: 12.0, 13.1, 11.8, 12.6, 14.0, 11.8, 12.7, 13.5, 12.4, 12.2, 11.6, 12.9
- L boat: 11.8, 12.1, 12.0, 11.6, 11.8, 12.0, 11.9, 12.6, 11.4, 12.0, 12.2, 11.7
Test the null hypothesis that the two boats perform equally well. Is one boat faster, on the average, than the other?