Lesson 12: Summary and Review

Lesson 12: Summary and Review

Overview

This lesson is a culmination of STAT 500. A review of all the statistical techniques is provided, as well as table consisting of inferences, parameters, statistics, types of data, examples, analysis, and conditions.

Objectives

Upon successful completion of this lesson, you should be able to:

  • Review the statistical techniques covered in STAT 500.
  • Given a real-world application choose the correct statistical technique.

12.1 - Summary of Statistical Techniques

12.1 - Summary of Statistical Techniques

Tabbed Flow Charts

One value
One value
Difference between
two values
Difference between <br>two values
One value or the
difference
between two values?
[Not supported by viewer]
Quantitative
Quantitative
Categorical
Categorical
Two
quantitative
or categorical variables?
Two <br/>quantitative <br/>or categorical variables?
Paired
Paired
Independent
Independent
Independent
or paired
samples?
[Not supported by viewer]
Quantitative
Quantitative
Categorical
Categorical
A quantitative
or categorical
variable?
[Not supported by viewer]
One Mean Confidence Interval
One Mean Confidence Interval
One Proportion Confidence Interval
One Proportion Confidence Interval
Two Paired Means Confidence Interval
Two Paired Means Confidence Interval
Two Independent Means Confidence Interval
Two Independent Means Confidence Interval
Two Proportions Confidence Interval
Two Proportions Confidence Interval
Categorical
Categorical
Quantitative
Quantitative
Quantitative or
categorical response
variables?
[Not supported by viewer]
One
One
More than 2
More than 2
Two
Two
One, two
or more
samples?
[Not supported by viewer]
Independent
Independent
Paired
Paired
Independent
or paired
samples?
[Not supported by viewer]
One
One
Two
Two
One or
two samples?
One or <br/>wo samples?
One Sample Proportion Test
One Sample Proportion Test
Two Sample Proportion Test
One Sample Proportion Test
One Sample Mean Test
One Sample Mean Test
One-Way Analysis of Variance
One-Way Analysis of Variance
Two Independent Means Test
Two Independent Means Test
Paired Sample Mean Test
Paired Sample Mean Test
Quantitative
Quantitative
Categorical
Categorical
Quantitative or
Categorical
Variables?
[Not supported by viewer]
Prediction
Prediction
Strength and
Direction
Strength and <br>Direction
Make a
prediction or
examine the strength and direction of the
relationship?
[Not supported by viewer]
Simple Linear Regression
Simple Linear Regression
Correlation
Correlation
Chi-Square Test of Independence
Chi-Square Test of Independence

Summary Table for Statistical Techniques

 

Estimating a Mean

Parameter

One opulation mean, \(\mu\)

Statistic

Sample mean, \(\bar{x}\)

Type of Data

Numerical

Analysis

1-sample t-interval

\(\bar{x}\pm t_{\alpha /2}\cdot \frac{s}{\sqrt{n}}\)

Minitab Command

Stat > Basic statistics > 1-sample t

Conditions

data approximately normal OR

have a large sample size (n ≥ 30)

Examples
  • What is the average weight of adults?
  • What is the average cholesterol level of adult females?

 

Test About a Mean

Parameter

One population mean, \(\mu\)

Statistic

Sample mean, \(\bar{x}\)

Type of Data

Numerical

Analysis

\(H_0\colon \mu = \mu_0\)

\(H_a\colon \mu \ne \mu_0\) OR

\(H_a\colon \mu > \mu_0\) OR

\(H_a\colon \mu < \mu_0\)

1-sample t-test:

\(t=\frac{\bar{x}-\mu_{0}}{\frac{s}{\sqrt{n}}}\)

Minitab Command

Stat > Basic statistics > 1-sample t

Conditions

data approximately normal

OR

have a large sample size (n ≥ 30)

Examples
  • Is the average GPA of juniors at Penn State higher than 3.0?
  • Is the average winter temperature in State College less than 42°F?

 

Estimating a Proportion

Parameter

One population proportion \(p\)

Statistic

Sample proportion, \(\hat{p}\)

Type of Data

Categorical (Binary)

Analysis

1-proportion Z-interval:

\( \hat{p}\pm z_{\alpha /2}\sqrt{\frac{\hat{p}\cdot \left ( 1-\hat{p} \right )}{n}}\)

Minitab Command

Stat > Basic statistics > 1-sample proportion

Conditions
have at least 5 in each category
Examples
  • What is the proportion of males in the world?
  • What is the proportion of students that smoke?

 

Test about a Proportion

Parameter

One population proportion, \(p\)

Statistic

Sample proportion, \(\hat{p}\)

Type of Data

Categorical (Binary)

Analysis

\(H_0\colon p = p_0\)

\(H_a\colon p \ne p_0\) OR

\(H_a\colon p > p_0\) OR

\(H_a\colon p < p_0\)

1-proportion Z-test:

\(z=\frac{\hat{p}-p _{0}}{\sqrt{\frac{p _{0}\left ( 1- p _{0}\right )}{n}}}\)

Minitab Command

Stat > Basic statistics > 1-sample proportion

Conditions

\(np_0 \geq 5\) and

\(n (1 - p_0) \geq 5\)

Examples
  • Is the proportion of females different from 0.5?
  • Is the proportion of students who fail STAT 500 less than 0.1?

 

 

Estimating the Difference of Two Means*

Parameter

Difference in two population means,

\(\mu_1 - \mu_2\)

Statistic

Difference in two sample means,

\(\bar{x}_{1} - \bar{x}_{2}\)

Type of Data

Numerical

Analysis

2-sample t-interval:

\(\bar{x}_{1}-\bar{x}_{2}\pm t_{\alpha /2}\cdot \\\hat{s.e.}\left (\bar{x}_{1}-\bar{x}_{2} \right )\)

Minitab Command

Stat > Basic statistics > 2-sample t

Conditions

Independent samples from the two populations

Data in each sample are about normal or large samples

Examples
  • How different are the mean GPAs of males and females?
  • How many fewer colds do vitamin C takers get, on average, than non-vitamin takers?

 

Test to Compare Two Means*

Parameter

Difference in two population means,

\(\mu_1 - \mu_2\)

Statistic

Difference in two sample means,

\(\bar{x}_{1} - \bar{x}_{2}\)

Type of Data

Numerical

Analysis

\(H_0\colon \mu_1 = \mu_2\) \(H_a\colon \mu_1 \ne \mu_2\) OR

\(H_a\colon \mu_1 > \mu_2\) OR

\(H_a\colon \mu_1 < \mu_2\)

2-sample t-test: \(t=\frac{\left (\bar{x}_{1}-\bar{x}_{2} \right )-0}{\hat{s.e.}\left (\bar{x}_{1}-\bar{x}_{2} \right )} \)

Minitab Command

Stat > Basic statistics > 2-sample t

Conditions

Independent samples from the two populations

Data in each sample are about normal or large samples

Examples
  • Do the mean pulse rates of exercisers and non-exercisers differ?
  • Is the mean EDS score for dropouts greater than the mean EDS score for graduates?

 

*(The Standard Error (S.E.) will depend on pooled vs unpooled)

Estimating a Mean with Paired Data

Parameter

Mean of paired difference,

\(\mu_D\)

Statistic

Sample mean of difference,

\(\bar{d}\)

Type of Data

Numerical

Analysis

paired t-interval:

\(\bar{d}\pm t_{\alpha /2}\cdot \frac{s_{d}}{\sqrt{n}}\)

Minitab Command

Stat > Basic statistics > Paired t

Conditions

Differences approximately normal OR

Have a large number of pairs (n ≥ 30)

Examples
  • What is the difference in pulse rates, on the average, before and after exercise?

 

Test about a Mean with Paired Data

Parameter

Mean of paired difference,

\(\mu_D\)

Statistic

Sample mean of difference,

\(\bar{d}\)

Type of Data

Numerical

Analysis

\(H_0\colon \mu_D = 0\)

\(H_a\colon \mu_D \ne 0\) OR

\(H_a\colon \mu_D > 0\) OR

\(H_a\colon \mu_D < 0\)

t-test statistic:

\(t=\frac{\bar{d}-0}{\frac{s_d}{\sqrt{n}}}\)

Minitab Command

Stat > Basic statistics > Paired t

Conditions

Differences approximately normal OR

Have a large number of pairs (n ≥ 30)

Examples
  • Is the difference in IQ of pairs of twins zero?
  • Are the pulse rates of people higher after exercise?

 

Estimating the Difference of Two Proportions

Parameter

Difference in two population proportions,

\(p_1 - p_2\)

Statistic

Difference in two sample proportions,

\(\hat{p}_{1} - \hat{p}_{2}\)

Type of Data

Categorical (Binary)

Analysis

2-proportions Z-interval:

\(\hat{p} _{1}-\hat{p} _{2}\pm z_{\alpha /2}\cdot\\ \hat{s.e.}\left ( \hat{p} _{1}-\hat{p} _{2} \right )\)

Minitab Command

Stat > Basic statistics > 2 proportions

Conditions

Independent samples from the two populations

Have at least 5 in each category for both populations

Examples
  • How different are the percentages of male and female smokers?
  • How different are the percentages of upper- and lower-class binge drinkers?

 

Test to Compare Two Proportions

Parameter

Difference in two population proportions,

\(p_1 - p_2\)

Statistic

Difference in two sample proportions,

\(\hat{p}_{1} - \hat{p}_{2}\)

Type of Data

Categorical (Binary)

Analysis

\(H_0\colon p_1 = p_2\)

\(H_a\colon p_1 \ne p_2 \) OR

\(H_a\colon p_1 > p_2\) OR

\(H_a\colon p_1 < p_2\)

2-proportion Z-test:

\(z^*=\frac{\hat{p}_{1}-\hat{p}_{2}}{\sqrt{\hat{p}^*\left ( 1-\hat{p}^* \right )\left ( \frac{1}{n_{1}}+ \frac{1}{n_{2}}\right )}}\)

\(\hat{p}^*=\dfrac{x_{1}+x_{2}}{n_{1}+n_{2}}\)

Minitab Command

Stat > Basic statistics > 2 proportions

Conditions

Independent samples from the two populations

Have at least 5 in each category for both populations

Examples
  • Is the percentage of males with lung cancer higher than the percentage of females with lung cancer?

  • Are the percentages of upper- and lower- class binge drinkers different?

 

Relationship in a 2-Way Table

Parameter

Relationship between two categorical variables, OR

difference in two or more population proportions

Statistic

The observed counts in a two-way table

Type of Data

Categorical  

Analysis

\(H_0\colon\text{The two variables are not related}\)

\(H_a\colon\text{The two variables are related}\)

Chi-square test statistic:

\(X^2=\sum_{\text{all cells}}\frac{(\text{Observed-Expected})^2}{\text{Expected}}\)

Minitab Command

Stat > Tables > Chi square Test for Association

Conditions

All expected counts should be greater than 1

At least 80% of the cells should have an expected count greater than 5

Examples
  • Is there a relationship between smoking and lung cancer?
  • Do the proportions of students in each class who smoke differ?

 

Test About a Slope

Parameter

Slope of the population regression line,

\(\beta_1\)

Statistic

Sample estimate of the slope,

\(b_1\)

Type of Data

Numerical

Analysis

\(H_0\colon \beta_1 = 0\)

\(H_a\colon \beta_1 \ne 0\) OR

\(H_a\colon \beta_1 > 0\) OR

\(H_a\colon \beta_1 < 0\)

t-test with n - 2 degrees of freedom:

\(t=\dfrac{b_{1}-0}{\hat{s.e.}\left ( b_{1} \right )}\)

Minitab Command

Stat > Regression > Regression

Conditions

The form of the equation that links the two variables must be correct

The error terms are normally distributed

The errors terms have equal variances

The error terms are independent of each other

Examples
  • Is there a linear relationship between height and weight of a person?

 

Test to Compare Several Means

Parameter

Population means of the t populations,

\(\mu_1, \mu_2, \cdots , \mu_t\)

Statistic

Sample means of the t populations,

\(x_1, x_2, \cdots , x_t\)

Type of Data

Numerical

Analysis

\(H_0\colon \mu_1 = \mu_2 = ... = \mu_t\)

\(H_a\colon \text{not all the means are equal}\)

F-test for one-way ANOVA:

\(F=\dfrac{MST}{MSE}\)

Minitab Command

Stat > ANOVA > Oneway

Conditions

Each population is normally distributed

Independent samples from the t populations

Equal population standard deviations

Examples
  • Is there a difference between the mean GPA of freshman, sophomore, junior, and senior classes?

 

Test of Strength & Direction of Linear Relationship of 2 Quantitative Variables

Parameter

Population correlation,

\(\rho\)

"rho"

Statistic

Sample correlation,

\(r\)

Type of Data

Numerical

Analysis

\(H_0\colon \rho = 0\)

\(H_a\colon \rho \ne 0\)

t-test statistic:

\(t=\frac{r\sqrt{n-2}}{\sqrt{1-r^2}}\)

Minitab Command

Stat > Basic Statistics > Correlation

Conditions

2 variables are continuous

Related pairs

No significant outliers

Normality of both variables

Linear relationship between the variables

Examples
  • Is there a linear relationship between height and weight?

 

Test to Compare Two Population Variances

Parameter

Population variances of two populations,

\(\sigma_{1}^{2}, \sigma_{2}^{2}\)

Statistic

Sample variances of two populations,

\(s_{1}^{2}, s_{2}^{2}\)

Type of Data

Numerical

Analysis

\(H_0\colon \sigma_{1}^{2} = \sigma_{2}^{2}\)

\(H_2\colon \sigma_{1}^{2} \ne \sigma_{2}^{2}\)

F-test statistic:

\(F=\frac{s_{1}^{2}}{s_{2}^{2}}\)

Minitab Command

Stat > Basic statistics > 2 variances

Conditions

Each population is normally distributed

Independent samples from the 2 populations

Examples
  • Are the variances of length of lumber produced by Company A different from those produced by Company B?


12.2 - Choose the Correct Statistical Technique

12.2 - Choose the Correct Statistical Technique

List of Statistical Techniques

Estimate a Value

  • Estimating a Mean
  • Estimating a Proportion
  • Estimating the difference of two means
  • Estimating a mean with paired data
  • Estimating the difference of two proportions

Test a hypothesis

  • Test about a mean
  • Test about a proportion
  • Test to compare two means (independent)
  • Test to compare two means (paired)
  • Test to compare two proportions
  • Test about a slope
  • Test to compare several means
  • Test of Strength & Direction of Linear Relationship of 2 Quantitative Variables
  • Test to Compare Two Population Variances

Examine a Relationship

  • Relationship in a 2-Way Table
 

Choose the Correct Statistical Technique

Directions

For the scenarios below, choose a method that is suitable for the given situation.

Note: There is no need to work out the following problems. This is simply an exercise to help you select the appropriate statistical method given the description of a research context. Determine the statistical technique(s) that you think is most appropriate and then click on the 'Inspect' button on the right to compare your answers!
  1. A survey of National Federation of Independence Business (NFIB) indicates that small businesses intended to increase their hiring as well as their capital expenditures during 1986 as compared with 1985. Suppose that, as part of a follow-up survey by NFIB, 20 small businesses, randomly chosen from the NFIB's list of 2,100 companies, show an average hiring from 1985 equal to 3.2 new employees per firm and a standard deviation of 1.5 hires. A random sample of 30 small businesses taken at the end of 1986 shows an average of 5.1 new hires and a standard deviation of 2.3 hires. At the \(\alpha = 0.01\) level of significance, can you conclude that average hiring by all small businesses in 1986 increased as compared with 1985?
  2. It is known that the average stay of tourists in Hong Kong hotels has been 3.4 nights. A tourism industry analyst wanted to test whether recent changes in the nature of tourism to Hong Kong have changed from this past average. The analyst obtained the following random sample of the number of nights spent by tourists in Hong Kong hotels: 5, 4, 3, 2, 1, 1, 5, 7, 8, 4, 3, 3, 2, 5, 7, 1, 3, 1, 1, 5, 3, 4, 2, 2, 2, 6, 1, 7. Conduct the test using the 0.05 level of significance.
  3. There are 155 banks involved in certain international transactions. A federal agency claims that at least 35% of these banks have total assets of over \$10 billion (In U.S. dollars). An independent agency wants to test this claim. It gets a random sample of 50 out of the 155 banks and finds that 15 of them have total assets of over \$10 billion. Can the claim be rejected?
  4. General Motors Corporation hopes to reduce anticipated production costs of its Saturn Model by instituting an assembly schedule that will reduce average production time to about 40 hours per car. In a test run of the new assembly line, 40 cars are built at a sample average time per car of 46.5 hours and a sample standard deviation of 8.0 hours. A test run of 38 cars using the old assembly schedule results in a sample of mean of 51.2 hours and a sample deviation of 9.5 hours. Is there proof that the new assembly schedule reduces the average production time per car?
  5. A telephone company wants to estimate the average length of long-distance calls during weekends. A random sample of 50 calls gives a mean \(\bar{X} =14.5\) min and standard deviation s = 5.6 min. Provide an interval estimate for the average length of a long-distance phone call during weekends.
  6. Several companies have been developing electronic guidance systems for cars. Motorola and Germany's Blauounkt are two firms in the forefront of such research. Out of 120 trials of the Motorola model, 101 were successful; and out of 200 tests of the Blaupunkt model, 110 were successful. Is there evidence to conclude that the Motorola electronic guidance system is superior to the German competitor?
  7. An important measure of the risk associated with a stock is the standard deviation, or variance, of the stock's price movements. A financial analyst wants to test the one-tailed hypothesis that stock A has a greater risk (larger variance of price) than stock B. A random sample of 25 daily prices of stock A gives \(s_{A}^2= 6.52\), and a random sample of 22 daily prices of stock B gives a sample variance of \(s_{B}^2= 3.47\). Carry out the test at \(\alpha = 0.01\).
  8. A company is interested in offering its employees one of two employee benefit packages. A random sample of the company's employees is collected, and each person in the sample is asked to rate each of the two packages on an overall preference scale of 0 to 100. The order of presentation of each of the two plans is randomly selected for each person in the sample. The paired data are:
    • Program A: 45 67 63 59 77 69 45 39 52 58 70 46 60 65 59 80
    • Program B: 56 70 60 45 85 79 50 46 50 60 82 40 65 55 81 68

    Determine whether the employees rate one package higher.

  9. Analysis of variance has long been used in providing evidence of the effectiveness of pharmaceutical drugs. Such evidence is required before the FDA will allow a drug to be marketed. In a recent test of the effectiveness of a new sleeping pill, three groups of 25 patients each were given the following treatments. One group was given the drug, the second group was given a placebo, and the third group was given no treatment at all. The results are as follows.
    Drug group 12, 17, 34, 11, 5, 42, 18, 27, 2, 37, 50, 32, 12, 27, 21, 10, 4, 33, 63, 22, 41, 19, 28, 29, 8
    Placebo group 44, 32, 28, 30, 22, 12, 3, 12, 42, 13, 27, 54, 56, 32, 37, 28, 22, 22, 24, 9, 20, 4, 13, 42, 67
    No-treatment group 32, 33, 21, 12, 15, 14, 55, 67, 72, 1, 44, 60, 36, 38, 49, 66, 89, 63, 23, 6, 9, 56, 58, 39, 59

    Determine whether or not the drug is effective.

  10. The maker of portable exercise equipment, designed for the health-conscious people who travel too frequently to use a regular athletic club, wants to estimate the proportion of traveling business people who may be interested in the product. A random sample of 120 traveling business people indicates that 28 of them may be interested in purchasing the portable fitness equipment. Provide an interval estimate for the proportion of all travelling business people who may be interested in the product.
  11. A study undertaken by Montgomery Securities to access average labor and materials costs incurred by Chrysler and General Motors in building a typical four-door, intermediate-sized car. The reported average cost for Chrysler was \$9500, and for GM it was \$9780. Suppose that these data are based on random samples of 25 cars for each company, and suppose that both standard deviations are equal to \$1500. Test the hypothesis that the average GM car of this type is more expensive to build than the average Chrysler car of the same type.
  12. Recent studies indicates that in order to be globally competitive, firms must form global strategic partnerships. An investment banker wants to test whether the return on investment for international ventures is different from return on investment for similar domestic ventures. A sample of 12 firms that recently entered into ventures with foreign companies is available. For each firm, the return on investment for both the international venture (I), and similar domestic venture (D) is given:
    • D(%): 10 12 14 12 12 17 9 15 8.5 11 7 15
    • I(%) : 11 14 15 11 12.5 16 10 13 10.5 17 9 19

    Assuming that these firms represent a random sample from the population of all firms involved in global strategic partnerships, can the investment banker conclude that there are differences between average returns on domestic ventures and average returns on international ventures? Explain.

  13. When new paperback novels are promoted at bookstores, a display is often arranged with copies of the same book with differently colored covers. A publishing house wanted to find out whether there is a dependence between the place where the book is sold and the color of its cover. For one of its latest novels, the publisher sent displays and a supply of copies of the novels to large bookstores in five major cities. The resulting sales of the novel for each city-color combination are as follows. Numbers are in thousands of copies sold over a three-month period.
    City Red Blue Green Yellow Total
    New York 21 27 40 15 103
    Washington 14 18 28 8 68
    Boston 11 13 21 7 52
    Chicago 3 33 30 9 75
    Los Angeles 30 11 34 10 84
    Total 79 102 153 49 383

    Assume that the data are random samples for each particular color-city combination and that the inference may apply to all novels. Are color and location related?

  14. Certain eggs are stated to have reduced cholesterol content, with an average of only 2.5% cholesterol. A concerned health group wants to test whether the claim is true. The group believes that more cholesterol may be found, on the average, in the eggs. A random sample of 100 eggs reveals a sample average content of 5.2% cholesterol, and a sample standard deviation of 2.8%. Does the health group have cause for action?
  15. Two 12-meter boats, the K boat and the L boat, are tested as possible contenders in the America's Cup races. The following data represent the time, in minutes, to complete a particular tack in independent random trials of the two boats.
    • K boat: 12.0, 13.1, 11.8, 12.6, 14.0, 11.8, 12.7, 13.5, 12.4, 12.2, 11.6, 12.9
    • L boat: 11.8, 12.1, 12.0, 11.6, 11.8, 12.0, 11.9, 12.6, 11.4, 12.0, 12.2, 11.7

    Test the null hypothesis that the two boats perform equally well. Is one boat faster, on the average, than the other?


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility