Lesson 12: Summary and Review

Overview

This lesson is a culmination of STAT 500. A review of all the statistical techniques is provided, as well as table consisting of inferences, parameters, statistics, types of data, examples, analysis, and conditions.

Objectives

Upon successful completion of this lesson, you should be able to:

Review the statistical techniques covered in STAT 500.
Given a real-world application choose the correct statistical technique.

12.1 - Summary of Statistical Techniques

Tabbed Flow Charts

Summary Table for Statistical Techniques

Download a PDF version of the following statistical techniques: Table of Statistical Techniques

Estimating a Mean

Parameter

One opulation mean, $\mu$

Statistic

Sample mean, $\bar{x}$

Type of Data

Numerical

Analysis

1-sample t-interval

$\bar{x}\pm t_{\alpha /2}\cdot \frac{s}{\sqrt{n}}$

Minitab Command

Stat > Basic statistics > 1-sample t

Conditions

data approximately normal OR

have a large sample size (n ≥ 30)

Examples

What is the average weight of adults?
What is the average cholesterol level of adult females?

Test About a Mean

Parameter

One population mean, $\mu$

Statistic

Sample mean, $\bar{x}$

Type of Data

Numerical

Analysis

$H_0\colon \mu = \mu_0$

$H_a\colon \mu \ne \mu_0$ OR

$H_a\colon \mu > \mu_0$ OR

$H_a\colon \mu < \mu_0$

1-sample t-test:

$t=\frac{\bar{x}-\mu_{0}}{\frac{s}{\sqrt{n}}}$

Minitab Command

Stat > Basic statistics > 1-sample t

Conditions

data approximately normal

have a large sample size (n ≥ 30)

Examples

Is the average GPA of juniors at Penn State higher than 3.0?
Is the average winter temperature in State College less than 42°F?

Estimating a Proportion

Parameter

One population proportion $p$

Statistic

Sample proportion, $\hat{p}$

Type of Data

Categorical (Binary)

Analysis

1-proportion Z-interval:

$ \hat{p}\pm z_{\alpha /2}\sqrt{\frac{\hat{p}\cdot \left ( 1-\hat{p} \right )}{n}}$

Minitab Command

Stat > Basic statistics > 1-sample proportion

Conditions

have at least 5 in each category

Examples

What is the proportion of males in the world?
What is the proportion of students that smoke?

Test about a Proportion

Parameter

One population proportion, $p$

Statistic

Sample proportion, $\hat{p}$

Type of Data

Categorical (Binary)

Analysis

$H_0\colon p = p_0$

$H_a\colon p \ne p_0$ OR

$H_a\colon p > p_0$ OR

$H_a\colon p < p_0$

1-proportion Z-test:

$z=\frac{\hat{p}-p _{0}}{\sqrt{\frac{p _{0}\left ( 1- p _{0}\right )}{n}}}$

Minitab Command

Stat > Basic statistics > 1-sample proportion

Conditions

$np_0 \geq 5$ and

$n (1 - p_0) \geq 5$

Examples

Is the proportion of females different from 0.5?
Is the proportion of students who fail STAT 500 less than 0.1?

Estimating the Difference of Two Means*

Parameter

Difference in two population means,

$\mu_1 - \mu_2$

Statistic

Difference in two sample means,

$\bar{x}_{1} - \bar{x}_{2}$

Type of Data

Numerical

Analysis

2-sample t-interval:

$\bar{x}_{1}-\bar{x}_{2}\pm t_{\alpha /2}\cdot \\\hat{s.e.}\left (\bar{x}_{1}-\bar{x}_{2} \right )$

Minitab Command

Stat > Basic statistics > 2-sample t

Conditions

Independent samples from the two populations

Data in each sample are about normal or large samples

Examples

How different are the mean GPAs of males and females?
How many fewer colds do vitamin C takers get, on average, than non-vitamin takers?

Test to Compare Two Means*

Parameter

Difference in two population means,

$\mu_1 - \mu_2$

Statistic

Difference in two sample means,

$\bar{x}_{1} - \bar{x}_{2}$

Type of Data

Numerical

Analysis

$H_0\colon \mu_1 = \mu_2$ $H_a\colon \mu_1 \ne \mu_2$ OR

$H_a\colon \mu_1 > \mu_2$ OR

$H_a\colon \mu_1 < \mu_2$

2-sample t-test: $t=\frac{\left (\bar{x}_{1}-\bar{x}_{2} \right )-0}{\hat{s.e.}\left (\bar{x}_{1}-\bar{x}_{2} \right )} $

Minitab Command

Stat > Basic statistics > 2-sample t

Conditions

Independent samples from the two populations

Data in each sample are about normal or large samples

Examples

Do the mean pulse rates of exercisers and non-exercisers differ?
Is the mean EDS score for dropouts greater than the mean EDS score for graduates?

*(The Standard Error (S.E.) will depend on pooled vs unpooled)

Estimating a Mean with Paired Data

Parameter

Mean of paired difference,

$\mu_D$

Statistic

Sample mean of difference,

$\bar{d}$

Type of Data

Numerical

Analysis

paired t-interval:

$\bar{d}\pm t_{\alpha /2}\cdot \frac{s_{d}}{\sqrt{n}}$

Minitab Command

Stat > Basic statistics > Paired t

Conditions

Differences approximately normal OR

Have a large number of pairs (n ≥ 30)

Examples

What is the difference in pulse rates, on the average, before and after exercise?

Test about a Mean with Paired Data

Parameter

Mean of paired difference,

$\mu_D$

Statistic

Sample mean of difference,

$\bar{d}$

Type of Data

Numerical

Analysis

$H_0\colon \mu_D = 0$

$H_a\colon \mu_D \ne 0$ OR

$H_a\colon \mu_D > 0$ OR

$H_a\colon \mu_D < 0$

t-test statistic:

$t=\frac{\bar{d}-0}{\frac{s_d}{\sqrt{n}}}$

Minitab Command

Stat > Basic statistics > Paired t

Conditions

Differences approximately normal OR

Have a large number of pairs (n ≥ 30)

Examples

Is the difference in IQ of pairs of twins zero?
Are the pulse rates of people higher after exercise?

Estimating the Difference of Two Proportions

Parameter

Difference in two population proportions,

$p_1 - p_2$

Statistic

Difference in two sample proportions,

$\hat{p}_{1} - \hat{p}_{2}$

Type of Data

Categorical (Binary)

Analysis

2-proportions Z-interval:

$\hat{p} _{1}-\hat{p} _{2}\pm z_{\alpha /2}\cdot\\ \hat{s.e.}\left ( \hat{p} _{1}-\hat{p} _{2} \right )$

Minitab Command

Stat > Basic statistics > 2 proportions

Conditions

Independent samples from the two populations

Have at least 5 in each category for both populations

Examples

How different are the percentages of male and female smokers?
How different are the percentages of upper- and lower-class binge drinkers?

Test to Compare Two Proportions

Parameter

Difference in two population proportions,

$p_1 - p_2$

Statistic

Difference in two sample proportions,

$\hat{p}_{1} - \hat{p}_{2}$

Type of Data

Categorical (Binary)

Analysis

$H_0\colon p_1 = p_2$

$H_a\colon p_1 \ne p_2 $ OR

$H_a\colon p_1 > p_2$ OR

$H_a\colon p_1 < p_2$

2-proportion Z-test:

$z^*=\frac{\hat{p}_{1}-\hat{p}_{2}}{\sqrt{\hat{p}^*\left ( 1-\hat{p}^* \right )\left ( \frac{1}{n_{1}}+ \frac{1}{n_{2}}\right )}}$

$\hat{p}^*=\dfrac{x_{1}+x_{2}}{n_{1}+n_{2}}$

Minitab Command

Stat > Basic statistics > 2 proportions

Conditions

Independent samples from the two populations

Have at least 5 in each category for both populations

Examples

Is the percentage of males with lung cancer higher than the percentage of females with lung cancer?
Are the percentages of upper- and lower- class binge drinkers different?

Relationship in a 2-Way Table

Parameter

Relationship between two categorical variables, OR

difference in two or more population proportions

Statistic

The observed counts in a two-way table

Type of Data

Categorical

Analysis

$H_0\colon\text{The two variables are not related}$

$H_a\colon\text{The two variables are related}$

Chi-square test statistic:

$X^2=\sum_{\text{all cells}}\frac{(\text{Observed-Expected})^2}{\text{Expected}}$

Minitab Command

Stat > Tables > Chi square Test for Association

Conditions

All expected counts should be greater than 1

At least 80% of the cells should have an expected count greater than 5

Examples

Is there a relationship between smoking and lung cancer?
Do the proportions of students in each class who smoke differ?

Test About a Slope

Parameter

Slope of the population regression line,

$\beta_1$

Statistic

Sample estimate of the slope,

$b_1$

Type of Data

Numerical

Analysis

$H_0\colon \beta_1 = 0$

$H_a\colon \beta_1 \ne 0$ OR

$H_a\colon \beta_1 > 0$ OR

$H_a\colon \beta_1 < 0$

t-test with n - 2 degrees of freedom:

$t=\dfrac{b_{1}-0}{\hat{s.e.}\left ( b_{1} \right )}$

Minitab Command

Stat > Regression > Regression

Conditions

The form of the equation that links the two variables must be correct

The error terms are normally distributed

The errors terms have equal variances

The error terms are independent of each other

Examples

Is there a linear relationship between height and weight of a person?

Test to Compare Several Means

Parameter

Population means of the t populations,

$\mu_1, \mu_2, \cdots , \mu_t$

Statistic

Sample means of the t populations,

$x_1, x_2, \cdots , x_t$

Type of Data

Numerical

Analysis

$H_0\colon \mu_1 = \mu_2 = ... = \mu_t$

$H_a\colon \text{not all the means are equal}$

F-test for one-way ANOVA:

$F=\dfrac{MST}{MSE}$

Minitab Command

Stat > ANOVA > Oneway

Conditions

Each population is normally distributed

Independent samples from the t populations

Equal population standard deviations

Examples

Is there a difference between the mean GPA of freshman, sophomore, junior, and senior classes?

Test of Strength & Direction of Linear Relationship of 2 Quantitative Variables

Parameter

Population correlation,

$\rho$

"rho"

Statistic

Sample correlation,

$r$

Type of Data

Numerical

Analysis

$H_0\colon \rho = 0$

$H_a\colon \rho \ne 0$

t-test statistic:

$t=\frac{r\sqrt{n-2}}{\sqrt{1-r^2}}$

Minitab Command

Stat > Basic Statistics > Correlation

Conditions

2 variables are continuous

Related pairs

No significant outliers

Normality of both variables

Linear relationship between the variables

Examples

Is there a linear relationship between height and weight?

Test to Compare Two Population Variances

Parameter

Population variances of two populations,

$\sigma_{1}^{2}, \sigma_{2}^{2}$

Statistic

Sample variances of two populations,

$s_{1}^{2}, s_{2}^{2}$

Type of Data

Numerical

Analysis

$H_0\colon \sigma_{1}^{2} = \sigma_{2}^{2}$

$H_2\colon \sigma_{1}^{2} \ne \sigma_{2}^{2}$

F-test statistic:

$F=\frac{s_{1}^{2}}{s_{2}^{2}}$

Minitab Command

Stat > Basic statistics > 2 variances

Conditions

Each population is normally distributed

Independent samples from the 2 populations

Examples

Are the variances of length of lumber produced by Company A different from those produced by Company B?

12.2 - Choose the Correct Statistical Technique

List of Statistical Techniques

Estimate a Value

Estimating a Mean
Estimating a Proportion
Estimating the difference of two means
Estimating a mean with paired data
Estimating the difference of two proportions

Test a hypothesis

Test about a mean
Test about a proportion
Test to compare two means (independent)
Test to compare two means (paired)
Test to compare two proportions
Test about a slope
Test to compare several means
Test of Strength & Direction of Linear Relationship of 2 Quantitative Variables
Test to Compare Two Population Variances

Examine a Relationship

Relationship in a 2-Way Table

Choose the Correct Statistical Technique

Directions

For the scenarios below, choose a method that is suitable for the given situation.

Note: There is no need to work out the following problems. This is simply an exercise to help you select the appropriate statistical method given the description of a research context. Determine the statistical technique(s) that you think is most appropriate and then click on the 'Inspect' button on the right to compare your answers!

A survey of National Federation of Independence Business (NFIB) indicates that small businesses intended to increase their hiring as well as their capital expenditures during 1986 as compared with 1985. Suppose that, as part of a follow-up survey by NFIB, 20 small businesses, randomly chosen from the NFIB's list of 2,100 companies, show an average hiring from 1985 equal to 3.2 new employees per firm and a standard deviation of 1.5 hires. A random sample of 30 small businesses taken at the end of 1986 shows an average of 5.1 new hires and a standard deviation of 2.3 hires. At the $\alpha = 0.01$ level of significance, can you conclude that average hiring by all small businesses in 1986 increased as compared with 1985?

Answer

Test to compare two independent population means
It is known that the average stay of tourists in Hong Kong hotels has been 3.4 nights. A tourism industry analyst wanted to test whether recent changes in the nature of tourism to Hong Kong have changed from this past average. The analyst obtained the following random sample of the number of nights spent by tourists in Hong Kong hotels: 5, 4, 3, 2, 1, 1, 5, 7, 8, 4, 3, 3, 2, 5, 7, 1, 3, 1, 1, 5, 3, 4, 2, 2, 2, 6, 1, 7. Conduct the test using the 0.05 level of significance.

Answer

Test about a mean
There are 155 banks involved in certain international transactions. A federal agency claims that at least 35% of these banks have total assets of over \$10 billion (In U.S. dollars). An independent agency wants to test this claim. It gets a random sample of 50 out of the 155 banks and finds that 15 of them have total assets of over \$10 billion. Can the claim be rejected?

Answer

Test about a proportion
General Motors Corporation hopes to reduce anticipated production costs of its Saturn Model by instituting an assembly schedule that will reduce average production time to about 40 hours per car. In a test run of the new assembly line, 40 cars are built at a sample average time per car of 46.5 hours and a sample standard deviation of 8.0 hours. A test run of 38 cars using the old assembly schedule results in a sample of mean of 51.2 hours and a sample deviation of 9.5 hours. Is there proof that the new assembly schedule reduces the average production time per car?

Answer

Test to compare two independent means
A telephone company wants to estimate the average length of long-distance calls during weekends. A random sample of 50 calls gives a mean $\bar{X} =14.5$ min and standard deviation s = 5.6 min. Provide an interval estimate for the average length of a long-distance phone call during weekends.

Answer

Estimate a mean
Several companies have been developing electronic guidance systems for cars. Motorola and Germany's Blauounkt are two firms in the forefront of such research. Out of 120 trials of the Motorola model, 101 were successful; and out of 200 tests of the Blaupunkt model, 110 were successful. Is there evidence to conclude that the Motorola electronic guidance system is superior to the German competitor?

Answer

Test to compare two proportions
An important measure of the risk associated with a stock is the standard deviation, or variance, of the stock's price movements. A financial analyst wants to test the one-tailed hypothesis that stock A has a greater risk (larger variance of price) than stock B. A random sample of 25 daily prices of stock A gives $s_{A}^2= 6.52$, and a random sample of 22 daily prices of stock B gives a sample variance of $s_{B}^2= 3.47$. Carry out the test at $\alpha = 0.01$.

Answer

Test to compare two population variances
A company is interested in offering its employees one of two employee benefit packages. A random sample of the company's employees is collected, and each person in the sample is asked to rate each of the two packages on an overall preference scale of 0 to 100. The order of presentation of each of the two plans is randomly selected for each person in the sample. The paired data are:
- Program A: 45 67 63 59 77 69 45 39 52 58 70 46 60 65 59 80
- Program B: 56 70 60 45 85 79 50 46 50 60 82 40 65 55 81 68
Determine whether the employees rate one package higher.

Answer

Test to compare two paired means

Analysis of variance has long been used in providing evidence of the effectiveness of pharmaceutical drugs. Such evidence is required before the FDA will allow a drug to be marketed. In a recent test of the effectiveness of a new sleeping pill, three groups of 25 patients each were given the following treatments. One group was given the drug, the second group was given a placebo, and the third group was given no treatment at all. The results are as follows.

Drug group	12, 17, 34, 11, 5, 42, 18, 27, 2, 37, 50, 32, 12, 27, 21, 10, 4, 33, 63, 22, 41, 19, 28, 29, 8
Placebo group	44, 32, 28, 30, 22, 12, 3, 12, 42, 13, 27, 54, 56, 32, 37, 28, 22, 22, 24, 9, 20, 4, 13, 42, 67
No-treatment group	32, 33, 21, 12, 15, 14, 55, 67, 72, 1, 44, 60, 36, 38, 49, 66, 89, 63, 23, 6, 9, 56, 58, 39, 59

Determine whether or not the drug is effective.

The maker of portable exercise equipment, designed for the health-conscious people who travel too frequently to use a regular athletic club, wants to estimate the proportion of traveling business people who may be interested in the product. A random sample of 120 traveling business people indicates that 28 of them may be interested in purchasing the portable fitness equipment. Provide an interval estimate for the proportion of all travelling business people who may be interested in the product.

Answer

Estimate a proportion
A study undertaken by Montgomery Securities to access average labor and materials costs incurred by Chrysler and General Motors in building a typical four-door, intermediate-sized car. The reported average cost for Chrysler was \$9500, and for GM it was \$9780. Suppose that these data are based on random samples of 25 cars for each company, and suppose that both standard deviations are equal to \$1500. Test the hypothesis that the average GM car of this type is more expensive to build than the average Chrysler car of the same type.

Answer

Test to compare two compare two independent means
Recent studies indicates that in order to be globally competitive, firms must form global strategic partnerships. An investment banker wants to test whether the return on investment for international ventures is different from return on investment for similar domestic ventures. A sample of 12 firms that recently entered into ventures with foreign companies is available. For each firm, the return on investment for both the international venture (I), and similar domestic venture (D) is given:
- D(%): 10 12 14 12 12 17 9 15 8.5 11 7 15
- I(%) : 11 14 15 11 12.5 16 10 13 10.5 17 9 19
Assuming that these firms represent a random sample from the population of all firms involved in global strategic partnerships, can the investment banker conclude that there are differences between average returns on domestic ventures and average returns on international ventures? Explain.

Answer

Test to compare two paired means

When new paperback novels are promoted at bookstores, a display is often arranged with copies of the same book with differently colored covers. A publishing house wanted to find out whether there is a dependence between the place where the book is sold and the color of its cover. For one of its latest novels, the publisher sent displays and a supply of copies of the novels to large bookstores in five major cities. The resulting sales of the novel for each city-color combination are as follows. Numbers are in thousands of copies sold over a three-month period.

City	Red	Blue	Green	Yellow	Total
New York	21	27	40	15	103
Washington	14	18	28	8	68
Boston	11	13	21	7	52
Chicago	3	33	30	9	75
Los Angeles	30	11	34	10	84
Total	79	102	153	49	383

Assume that the data are random samples for each particular color-city combination and that the inference may apply to all novels. Are color and location related?

Certain eggs are stated to have reduced cholesterol content, with an average of only 2.5% cholesterol. A concerned health group wants to test whether the claim is true. The group believes that more cholesterol may be found, on the average, in the eggs. A random sample of 100 eggs reveals a sample average content of 5.2% cholesterol, and a sample standard deviation of 2.8%. Does the health group have cause for action?

Answer

Test about a mean
Two 12-meter boats, the K boat and the L boat, are tested as possible contenders in the America's Cup races. The following data represent the time, in minutes, to complete a particular tack in independent random trials of the two boats.
- K boat: 12.0, 13.1, 11.8, 12.6, 14.0, 11.8, 12.7, 13.5, 12.4, 12.2, 11.6, 12.9
- L boat: 11.8, 12.1, 12.0, 11.6, 11.8, 12.0, 11.9, 12.6, 11.4, 12.0, 12.2, 11.7
Test the null hypothesis that the two boats perform equally well. Is one boat faster, on the average, than the other?

Answer

Test to compare two independent means

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility