12  Summary and Review

Overview

This lesson is a culmination of STAT 500. A review of all the statistical techniques is provided, as well as a table consisting of inferences, parameters, statistics, types of data, examples, analysis, and conditions.

Objectives

Upon completion of this lesson, you should be able to:

  1. Review the statistical techniques covered in STAT 500.
  2. Given a real-world application choose the correct statistical technique.

12.1 Summary of Statistical Techniques

Flowcharts

In this section, we will equip you with flow charts designed to streamline your decision-making process when selecting the appropriate statistical techniques for your analyses.

Estimate a Parameter Value

graph TD
    %% Flowchart structure
    A{"One value or<br> the difference<br> between two<br> values?"} -->|One value| B{"A quantitative<br> or categorical<br> variable?"}
    B -->|Quantitative| C["One Mean<br>Confidence Interval"]
    B -->|Categorical| D["One Proportion<br>Confidence Interval"]
    A -->|Difference between two values| E{"Two quantitative<br> or categorical<br> variables?"}
    E -->|Quantitative| F{"Independent<br> or paired<br> samples?"}
    F -->|Paired| G["Two Paired Means<br>Confidence Interval"]
    F -->|Independent| H["Two Independent Means<br>Confidence Interval"]
    E -->|Categorical| I["Two Proportions<br>Confidence Interval"]

    %% Style: Cream color, bold text, rounded corners
    classDef cream fill:#FFF3CD, stroke:#333, stroke-width:2px, font-weight:bold, rx:10, ry:10;

    %% Apply styles
    class A,B,E,F cream;
    class C,D,G,H,I cream;

Test a Hypothesis

graph TD
    %% Decision nodes using curly braces ({}), answers with rounded rectangles
    A{"Quantitative or<br> categorical response<br> variables?"} --> |Categorical| B{"One<br> or two<br> samples?"}
    B -->|One| C(["One Sample<br>Proportion Test"])
    B -->|Two| D(["Two Sample<br>Proportion Test"])
    A -->|Quantitative| E{"One, two<br> or more<br> samples?"}
    E -->|One| F(["One Sample<br>Mean Test"])
    E -->|Two| G{"Independent or<br>paired?"}
    G -->|Independent| H(["Two Independent<br>Means Test"])
    G -->|Paired| I(["Paired Sample<br>Mean Test"])
    E -->|More than 2| J(["One-Way Analysis<br>of Variance"])

    %% Style for light blue theme
    classDef question fill:#D1E8F0, stroke:#333, stroke-width:2px, font-weight:bold;
    classDef answer fill:#D1E8F0, stroke:#333, stroke-width:2px, font-weight:bold, rx:10, ry:10;

    %% Apply classes
    class A,B,E,G question;
    class C,D,F,H,I,J answer;

Examine a Relationship

graph TD
    %% Decision and terminal nodes
    A{"Quantitative or<br>Categorical<br> Variables?"} -->|Quantitative| B{"Make a<br> prediction or examine<br> the strength and direction<br> of the<br> relationship?"}
    A -->|Categorical| C(["Chi-Square Test of Independence"])

    B -->|Prediction| D(["Simple Linear Regression"])
    B -->|Strength and Direction| E(["Correlation"])

    %% Style: light green background, bold font, rounded corners for answers
    classDef question fill:#E6F4EA, stroke:#333, stroke-width:2px, font-weight:bold;
    classDef answer fill:#E6F4EA, stroke:#333, stroke-width:2px, font-weight:bold, rx:10, ry:10;

    %% Apply styles
    class A,B question;
    class C,D,E answer;

Summary Table for Statistical Techniques

Download a PDF version of the following statistical techniques: Table of Statistical Techniques



Estimating a Mean

Parameter

One population mean, \(\mu\)

Statistic

Sample mean, \(\bar{x}\)

Type of Data

Quantitative

Analysis

1-sample t-interval

\(\bar{x}\pm t_{\alpha /2}\cdot \frac{s}{\sqrt{n}}\)

Minitab Command

Stat > Basic statistics > 1-sample t

Conditions

data approximately normal OR

have a large sample size (n ≥ 30)

Examples

  • What is the average weight of adults?
  • What is the average cholesterol level of adult females?

Test About a Mean

Parameter

One population mean, \(\mu\)

Statistic

Sample mean, \(\bar{x}\)

Type of Data

Quantitative

Analysis

\(H_0\colon \mu = \mu_0\)

\(H_a\colon \mu \ne \mu_0\) OR

\(H_a\colon \mu > \mu_0\) OR

\(H_a\colon \mu < \mu_0\)

1-sample t-test:

\(t=\dfrac{\bar{x}-\mu_{0}}{\frac{s}{\sqrt{n}}}\)

Minitab Command

Stat > Basic statistics > 1-sample t

Conditions

data approximately normal

OR

have a large sample size (n ≥ 30)

Examples

  • Is the average GPA of juniors at Penn State higher than 3.0?
  • Is the average winter temperature in State College less than 42°F?

Estimating a Proportion

Parameter

One population proportion \(p\)

Statistic

Sample proportion, \(\hat{p}\)

Type of Data

Categorical (Binary)

Analysis

1-proportion Z-interval:

\(\hat{p}\pm z_{\alpha /2}\sqrt{\frac{\hat{p}\cdot \left ( 1-\hat{p} \right )}{n}}\)

Minitab Command

Stat > Basic statistics > 1-sample proportion

Conditions

have at least 5 in each category

Examples

  • What is the proportion of males in the world?
  • What is the proportion of students that smoke?

Test About a Proportion

Parameter

One population proportion, \(p\)

Statistic

Sample proportion, \(\hat{p}\)

Type of Data

Categorical (Binary)

Analysis

\(H_0\colon p = p_0\)

\(H_a\colon p \ne p_0\) OR

\(H_a\colon p > p_0\) OR

\(H_a\colon p < p_0\)

1-proportion Z-test:

\(z=\dfrac{\hat{p}-p _{0}}{\sqrt{\frac{p _{0}\left ( 1- p _{0}\right )}{n}}}\)

Minitab Command

Stat > Basic statistics > 1-sample proportion

Conditions

\(np_0 \geq 5\) and

\(n (1 - p_0) \geq 5\)

Examples

  • Is the proportion of females different from 0.5?
  • Is the proportion of students who fail STAT 500 less than 0.1?

Estimating the Difference of Two Means*

Parameter

Difference in two population means,

\(\mu_1 - \mu_2\)

Statistic

Difference in two sample means,

\(\bar{x}_{1} - \bar{x}_{2}\)

Type of Data

Quantitative

Analysis

2-sample t-interval:

\(\bar{x}_{1}-\bar{x}_{2}\pm t_{\alpha /2}\cdot \\\hat{s.e.}\left (\bar{x}_{1}-\bar{x}_{2} \right )\)

Minitab Command

Stat > Basic statistics > 2-sample t

Conditions

Independent samples from the two populations

Data in each sample are about normal or large samples

Examples

  • How different are the mean GPAs of males and females?
  • How many fewer colds do vitamin C takers get, on average, than non-vitamin takers?

Test to Compare Two Means*

Parameter

Difference in two population means,

\(\mu_1 - \mu_2\)

Statistic

Difference in two sample means,

\(\bar{x}_{1} - \bar{x}_{2}\)

Type of Data

Quantitative

Analysis

\(H_0\colon \mu_1 = \mu_2\) \(H_a\colon \mu_1 \ne \mu_2\) OR

\(H_a\colon \mu_1 > \mu_2\) OR

\(H_a\colon \mu_1 < \mu_2\)

2-sample t-test: \(t=\dfrac{\left (\bar{x}_{1}-\bar{x}_{2} \right )-0}{\hat{s.e.}\left (\bar{x}_{1}-\bar{x}_{2} \right )}\)

Minitab Command

Stat > Basic statistics > 2-sample t

Conditions

Independent samples from the two populations

Data in each sample are about normal or large samples

Examples

  • Do the mean pulse rates of exercisers and non-exercisers differ?
  • Is the mean EDS score for dropouts greater than the mean EDS score for graduates?

*(The Standard Error (S.E.) will depend on pooled vs unpooled)


Estimating a Mean with Paired Data

Parameter

Mean of paired difference,

\(\mu_D\)

Statistic

Sample mean of difference,

\(\bar{d}\)

Type of Data

Quantitative

Analysis

paired t-interval:

\(\bar{d}\pm t_{\alpha /2}\cdot \frac{s_{d}}{\sqrt{n}}\)

Minitab Command

Stat > Basic statistics > Paired t

Conditions

Differences approximately normal OR

Have a large number of pairs (n ≥ 30)

Examples

  • What is the difference in pulse rates, on the average, before and after exercise?

Test to Compare Two Means (paired)

Parameter

Mean of paired difference,

\(\mu_D\)

Statistic

Sample mean of difference,

\(\bar{d}\)

Type of Data

Quantitative

Analysis

\(H_0\colon \mu_D = 0\)

\(H_a\colon \mu_D \ne 0\) OR

\(H_a\colon \mu_D > 0\) OR

\(H_a\colon \mu_D < 0\)

t-test statistic:

\(t=\dfrac{\bar{d}-0}{\frac{s_d}{\sqrt{n}}}\)

Minitab Command

Stat > Basic statistics > Paired t

Conditions

Differences approximately normal OR

Have a large number of pairs (n ≥ 30)

Examples

  • Is the difference in IQ of pairs of twins zero?
  • Are the pulse rates of people higher after exercise?

Estimating the Difference of Two Proportions

Parameter

Difference in two population proportions,

\(p_1 - p_2\)

Statistic

Difference in two sample proportions,

\(\hat{p}_{1} - \hat{p}_{2}\)

Type of Data

Categorical (Binary)

Analysis

2-proportions Z-interval:

\(\hat{p} _{1}-\hat{p} _{2}\pm z_{\alpha /2}\cdot\\ \hat{s.e.}\left ( \hat{p} _{1}-\hat{p} _{2} \right )\)

Minitab Command

Stat > Basic statistics > 2 proportions

Conditions

Independent samples from the two populations

Have at least 5 in each category for both populations

Examples

  • How different are the percentages of male and female smokers?
  • How different are the percentages of upper- and lower-class binge drinkers?

Test to Compare Two Proportions

Parameter

Difference in two population proportions,

\(p_1 - p_2\)

Statistic

Difference in two sample proportions,

\(\hat{p}_{1} - \hat{p}_{2}\)

Type of Data

Categorical (Binary)

Analysis

\(H_0\colon p_1 = p_2\)

\(H_a\colon p_1 \ne p_2\) OR

\(H_a\colon p_1 > p_2\) OR

\(H_a\colon p_1 < p_2\)

2-proportion Z-test:

\(z^*=\frac{\hat{p}_{1}-\hat{p}_{2}}{\sqrt{\hat{p}^*\left ( 1-\hat{p}^* \right )\left ( \frac{1}{n_{1}}+ \frac{1}{n_{2}}\right )}}\)

\(\hat{p}^*=\dfrac{x_{1}+x_{2}}{n_{1}+n_{2}}\)

Minitab Command

Stat > Basic statistics > 2 proportions

Conditions

Independent samples from the two populations

Have at least 5 in each category for both populations

Examples

  • Is the percentage of males with lung cancer higher than the percentage of females with lung cancer?
  • Are the percentages of upper- and lower- class binge drinkers different?

Relationship in a 2-Way Table

Parameter

Relationship between two categorical variables, OR

difference in two or more population proportions

Statistic

The observed counts in a two-way table

Type of Data

Categorical

Analysis

\(H_0\colon\text{The two variables are not related}\)

\(H_a\colon\text{The two variables are related}\)

Chi-square test statistic:

\(X^2=\sum_{\text{all cells}}\frac{(\text{Observed-Expected})^2}{\text{Expected}}\)

Minitab Command

Stat > Tables > Chi-square Test for Association

Conditions

All expected counts should be greater than 1

At least 80% of the cells should have an expected count greater than 5

Examples

  • Is there a relationship between smoking and lung cancer?
  • Do the proportions of students in each class who smoke differ?

Test About a Slope

Parameter

Slope of the population regression line,

\(\beta_1\)

Statistic

Sample estimate of the slope,

\(b_1\)

Type of Data

Quantitative

Analysis

\(H_0\colon \beta_1 = 0\)

\(H_a\colon \beta_1 \ne 0\) OR

\(H_a\colon \beta_1 > 0\) OR

\(H_a\colon \beta_1 < 0\)

t-test with n - 2 degrees of freedom:

\(t=\dfrac{b_{1}-0}{\hat{s.e.}\left ( b_{1} \right )}\)

Minitab Command

Stat > Regression > Regression

Conditions

The form of the equation that links the two variables must be correct

The error terms are normally distributed

The error terms have equal variances

The error terms are independent of each other

Examples

  • Is there a linear relationship between the height and weight of a person?

Test to Compare Several Means

Parameter

Population means of the t populations,

\(\mu_1, \mu_2, \cdots , \mu_t\)

Statistic

Sample means of the t populations,

\(x_1, x_2, \cdots , x_t\)

Type of Data

Quantitative

Analysis

\(H_0\colon \mu_1 = \mu_2 = ... = \mu_t\)

\(H_a\colon \text{not all the means are equal}\)

F-test for one-way ANOVA:

\(F=\dfrac{MST}{MSE}\)

Minitab Command

Stat > ANOVA > One-way

Conditions

Each population is normally distributed

Independent samples from the t populations

Equal population standard deviations

Examples

  • Is there a difference between the mean GPA of freshman, sophomore, junior, and senior classes?

Test of Strength & Direction of Linear Relationship of 2 Quantitative Variables

Parameter

Population correlation,

\(\rho\)

“rho”

Statistic

Sample correlation,

\(r\)

Type of Data

Quantitative

Analysis

\(H_0\colon \rho = 0\)

\(H_a\colon \rho \ne 0\)

t-test statistic:

\(t=\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\)

Minitab Command

Stat > Basic Statistics > Correlation

Conditions

2 variables are continuous

Related pairs

No significant outliers

Normality of both variables

Linear relationship between the variables

Examples

  • Is there a linear relationship between height and weight?

Test to Compare Two Population Variances

Parameter

Population variances of two populations,

\(\sigma_{1}^{2}, \sigma_{2}^{2}\)

Statistic

Sample variances of two populations,

\(s_{1}^{2}, s_{2}^{2}\)

Type of Data

Quantitative

Analysis

\(H_0\colon \sigma_{1}^{2} = \sigma_{2}^{2}\)

\(H_2\colon \sigma_{1}^{2} \ne \sigma_{2}^{2}\)

F-test statistic:

\(F=\dfrac{s_{1}^{2}}{s_{2}^{2}}\)

Minitab Command

Stat > Basic statistics > 2 variances

Conditions

Each population is normally distributed

Independent samples from the 2 populations

Examples

Are the variances of the length of lumber produced by Company A different from those produced by Company B?

12.2 Choose the Correct Statistical Technique

For the scenarios below, choose a statistical technique that is suitable for the given situation.

Note!
There is no need to work out the following problems. This is simply an exercise to help you select the appropriate statistical method given the description of a research context. Determine the statistical technique(s) that you think is most appropriate and then click on the ‘Inspect’ button on the right to compare your answers!

  1. A survey of National Federation of Independence Business (NFIB) indicates that small businesses intended to increase their hiring as well as their capital expenditures during 1986 as compared with 1985. Suppose that, as part of a follow-up survey by NFIB, 20 small businesses, randomly chosen from the NFIB’s list of 2,100 companies, show an average hiring from 1985 equal to 3.2 new employees per firm and a standard deviation of 1.5 hires. A random sample of 30 small businesses taken at the end of 1986 shows an average of 5.1 new hires and a standard deviation of 2.3 hires. At the \(\alpha = 0.01\) level of significance, can you conclude that average hiring by all small businesses in 1986 increased as compared with 1985?

    Test to compare two independent population means

  2. It is known that the average stay of tourists in Hong Kong hotels has been 3.4 nights. A tourism industry analyst wanted to test whether recent changes in the nature of tourism in Hong Kong have changed from this past average. The analyst obtained the following random sample of the number of nights spent by tourists in Hong Kong hotels: 5, 4, 3, 2, 1, 1, 5, 7, 8, 4, 3, 3, 2, 5, 7, 1, 3, 1, 1, 5, 3, 4, 2, 2, 2, 6, 1, 7. Conduct the test using the 0.05 level of significance.

    Test about a mean

  3. There are 155 banks involved in certain international transactions. A federal agency claims that at least 35% of these banks have total assets of over $10 billion (In U.S. dollars). An independent agency wants to test this claim. It gets a random sample of 50 out of the 155 banks and finds that 15 of them have total assets of over $10 billion. Can the claim be rejected?

    Test about a proportion

  4. General Motors Corporation hopes to reduce the anticipated production costs of its Saturn Model by instituting an assembly schedule that will reduce average production time to about 40 hours per car. In a test run of the new assembly line, 40 cars are built at a sample average time per car of 46.5 hours and a sample standard deviation of 8.0 hours. A test run of 38 cars using the old assembly schedule resulted in a sample mean of 51.2 hours and a sample deviation of 9.5 hours. Is there proof that the new assembly schedule reduces the average production time per car?

    Test to compare two independent means

  5. A telephone company wants to estimate the average length of long-distance calls during weekends. A random sample of 50 calls gives a mean \(\bar{X} =14.5\) min and standard deviation s = 5.6 min. Provide an interval estimate for the average length of a long-distance phone call during weekends.

    Estimate a mean

  6. Several companies have been developing electronic guidance systems for cars. Motorola and Germany’s Blaupunkt are two firms at the forefront of such research. Out of 120 trials of the Motorola model, 101 were successful; and out of 200 tests of the Blaupunkt model, 110 were successful. Is there evidence to conclude that the Motorola electronic guidance system is superior to the German competitor?

    Test to compare two proportions

  7. An important measure of the risk associated with a stock is the standard deviation, or variance, of the stock’s price movements. A financial analyst wants to test the one-tailed hypothesis that stock A has a greater risk (larger variance of price) than stock B. A random sample of 25 daily prices of stock A gives \(s_{A}^2= 6.52\), and a random sample of 22 daily prices of stock B gives a sample variance of \(s_{B}^2= 3.47\). Carry out the test at \(\alpha = 0.01\).

    Test to compare two population variances

  8. A company is interested in offering its employees one of two employee benefit packages. A random sample of the company’s employees is collected, and each person in the sample is asked to rate each of the two packages on an overall preference scale of 0 to 100. The order of presentation of each of the two plans is randomly selected for each person in the sample. The paired data are:

    • Program A: 45 67 63 59 77 69 45 39 52 58 70 46 60 65 59 80
    • Program B: 56 70 60 45 85 79 50 46 50 60 82 40 65 55 81 68

    Determine whether the employees rate one package higher.

    Test to compare two paired means

  9. Analysis of variance has long been used in providing evidence of the effectiveness of pharmaceutical drugs. Such evidence is required before the FDA will allow a drug to be marketed. In a recent test of the effectiveness of a new sleeping pill, three groups of 25 patients each were given the following treatments. One group was given the drug, the second group was given a placebo, and the third group was given no treatment at all. The results are as follows.

    Group

    Values

    Drug group

    12, 17, 34, 11, 5, 42, 18, 27, 2, 37, 50, 32, 12, 27, 21, 10, 4, 33, 63, 22, 41, 19, 28, 29, 8

    Placebo group

    44, 32, 28, 30, 22, 12, 3, 12, 42, 13, 27, 54, 56, 32, 37, 28, 22, 22, 24, 9, 20, 4, 13, 42, 67

    No-treatment group

    32, 33, 21, 12, 15, 14, 55, 67, 72, 1, 44, 60, 36, 38, 49, 66, 89, 63, 23, 6, 9, 56, 58, 39, 59

    Test to compare several means

  10. The maker of portable exercise equipment, designed for health-conscious people who travel too frequently to use a regular athletic club, wants to estimate the proportion of traveling business people who may be interested in the product. A random sample of 120 traveling business people indicates that 28 of them may be interested in purchasing the portable fitness equipment. Provide an interval estimate for the proportion of all traveling business people who may be interested in the product.

    Estimate a proportion

  11. A study undertaken by Montgomery Securities to access average labor and materials costs incurred by Chrysler and General Motors in building a typical four-door, intermediate-sized car. The reported average cost for Chrysler was $9500, and for GM it was $9780. Suppose that these data are based on random samples of 25 cars for each company, and suppose that both standard deviations are equal to $1500. Test the hypothesis that the average GM car of this type is more expensive to build than the average Chrysler car of the same type.

    Test to compare two independent means

  12. Recent studies indicate that to be globally competitive, firms must form global strategic partnerships. An investment banker wants to test whether the return on investment for international ventures is different from the return on investment for similar domestic ventures. A sample of 12 firms that recently entered into ventures with foreign companies is available. For each firm, the return on investment for both the international venture (I) and similar domestic venture (D) is given:

    • D(%): 10 12 14 12 12 17 9 15 8.5 11 7 15
    • I(%): 11 14 15 11 12.5 16 10 13 10.5 17 9 19

    Assuming that these firms represent a random sample from the population of all firms involved in global strategic partnerships, can the investment banker conclude that there are differences between average returns on domestic ventures and average returns on international ventures? Explain.

    Test to compare two paired means

  13. When new paperback novels are promoted at bookstores, a display is often arranged with copies of the same book with differently colored covers. A publishing house wanted to find out whether there is a dependence between the place where the book is sold and the color of its cover. For one of its latest novels, the publisher sent displays and a supply of copies of the novels to large bookstores in five major cities. The resulting sales of the novel for each city-color combination are as follows. Numbers are in thousands of copies sold over a three-month period.

    City Red Blue Green Yellow Total
    New York 21 27 40 15 103
    Washington 14 18 28 8 68
    Boston 11 13 21 7 52
    Chicago 3 33 30 9 75
    Los Angeles 30 11 34 10 84
    Total 79 102 153 49 383

    Assume that the data are random samples for each particular color-city combination and that the inference may apply to all novels. Are color and location related?

    Test for a relationship in a 2-way table

  14. Certain eggs are stated to have reduced cholesterol content, with an average of only 2.5% cholesterol. A concerned health group wants to test whether the claim is true. The group believes that more cholesterol may be found, on the average, in the eggs. A random sample of 100 eggs reveals a sample average content of 5.2% cholesterol and a sample standard deviation of 2.8%. Does the health group have cause for action?

    Test about a mean

  15. Two 12-meter boats, the K boat and the L boat, are tested as possible contenders in the America’s Cup races. The following data represent the time, in minutes, to complete a particular task in independent random trials of the two boats.

    • K boat: 12.0, 13.1, 11.8, 12.6, 14.0, 11.8, 12.7, 13.5, 12.4, 12.2, 11.6, 12.9
    • L boat: 11.8, 12.1, 12.0, 11.6, 11.8, 12.0, 11.9, 12.6, 11.4, 12.0, 12.2, 11.7

    Test the null hypothesis that the two boats perform equally well. Is one boat faster, on the average, than the other?

    Test to compare two independent means