Glossary

95% Rule: On a normal distribution approximately 95% of data will fall within two standard deviations of the mean; this is an abbreviated form of the Empirical Rule

Alternative Hypothesis: The statement that there is some difference in the population(s), denoted as \(H_a\) or \(H_1\)

Association: A relationship between variables

Bar chart: Graphical representation for categorical data in which vertical (or sometimes horizontal) bars are used to depict the number of experimental units in each category; bars are separated by space.; Penn State Fall 2017 Undergraduate Enrollments

Bias: The systematic favoring of certain outcomes.

Binomial random variable: A specific type of discrete random variable that counts how often a particular event occurs in a fixed number of tries or trials.

Blinding: Procedure employed in research to prevent bias in which the participants and/or the researchers interacting with the participations do not know which treatment each case is receiving.

Bootstrapping: A resampling procedure for constructing a sampling distribution using data from a sample.

Case: An experimental unit from which data are collected

Categorical variable: Names or labels (i.e., categories) with no logical order or with a logical order but inconsistent differences between groups, also known as qualitative.

Causation: Changes in one variable can be attributed to changes in a second variable.

Clustered Bar Chart: Each bar represents one combination of the two categorical variables (i.e., one cell in a contingency table). This is also known as a side-by-side bar chart.

Complement

The probability that the event does not occur. The complement of \(P(A)\) is \(P(A^C)\). This may also be written as \(P(A')\).

In the diagram below we can see that \(A^{C}\) is everything in the sample space that is not A.

Conditional Probability

The probability of one event occurring given that it is known that a second event has occurred. This is communicated using the symbol \(\mid\) which is read as "given."

For example, P(A\mid B) is read as "Probability of A given B."

Confidence Interval: A range computed using sample statistics to estimate an unknown population parameter with a stated level of confidence.

Confounding Variable: Characteristic that varies between cases and is related to both the explanatory and response variables; also known as a lurking variable or a third variable.

Continuous variable: Characteristic that varies and can take on any value and any value between values.

Control Group: A level of the explanatory variable that does not receive an active treatment; they may receive no treatment or a placebo.

Convenience Sampling: A method of obtaining a sample from a population by ease of accessibility; such a sample is not random and may not be representative of the intended population.

Correlation: A measure of the direction and strength of the relationship between two variables.

Deviation: An individual score minus the mean.

Discrete variable: Characteristic that varies and can only take on a set number of values.

Disjoint Events

Two events that do not occur at the same time. These are also known as mutually exclusive events.

In the Venn diagram below event A and event B are disjoint events because the two do not overlap.

Dotplot

Double-Blind Study: Research study in which neither the participants nor the researchers interacting with them know which cases have been assigned to which treatment groups.

Empirical Rule: On a normal distribution about 68% of data will be within one standard deviation of the mean, about 95% will be within two standard deviations of the mean, and about 99.7% will be within three standard deviations of the mean.

Experimental Research Design: A study in which the researcher manipulates the treatments received by subjects and collects data; also known as a scientific study

Explanatory Variable: Variable that is used to explain variability in the response variable, also known as an independent or predictor variable, it explains variations in the response variable; in an experimental study, it is manipulated by the researcher.

Frequency Table

A table containing the counts of how often each category occurs.

Summary Statistics
Campus	Count	Percent
University Park	40835	48.5%
Commonwealth Campuses	29388	34.9%
PA College of Technology	5465	6.5%
World Campus	8513	10.1%
Total	84201	100.0%

Penn State Fall 2017 Undergraduate Enrollments

Histogram

Independent Events: Unrelated events. The outcome of one event does not impact the outcome of the other event.

Independent Groups: Cases in each group are unrelated to one another.

Inferential Statistics: Statistical procedures that use data from an observed sample to make a conclusion about a population.

Interquartile range (IQR): The difference between the first and third quartiles.

Intersection

The overlap of two or more events and is symbolized by the character \(\cap\).

\(P(A \cap B)\) is read as "the probability of A and B."

Least squares method: Method of constructing a regression line which makes the sum of squared residuals as small as possible for the given data.

Left Skewed: A distribution in which the lower values (towards the left on a number line) are more spread out than the higher values. This is also known as negatively skewed.

Margin of Error: Half of the width of a confidence interval; equal to the multiplier times the standard error.

Mean

The numerical average; calculated as the sum of all of the data values divided by the number of values.

The sample mean is represented as \(\overline{x}\) ("x-bar") and the population mean is denoted as the Greek letter \(\mu\) ("mu"). The formula is the same for the sample mean and the population mean.

Median: The middle of the distribution that has been ordered from smallest to largest; for distributions with an even number of values, this is the mean of the two middle values.

Mode: The most frequently occurring value(s) in the distribution, may be used with quantitative or categorical variables.

Non-Response Bias: Systematic favoring of certain outcomes that occurs when the individuals who choose participate in a study differ from the individuals who choose to not participate.

Normal Distribution: One specific type of symmetrical distribution. This is also known as a bell-shaped distribution.

Null Hypothesis: The statement that there is not a difference in the population(s), denoted as \(H_0\)

Observational Research Design: A study in which the researcher collects data without performing any manipulations; also known as a non-experimental study

Odds: Express risk by comparing the likelihood of an event happening to the likelihood it does not happen. Note that the interpretation of odds is different from the interpretation of risk/probability/proportion.

p-value: Given that the null hypothesis is true, the probability of obtaining a sample statistic as extreme or more extreme than the one in the observed sample, in the direction of the alternative hypothesis.

Paired Groups: Cases in each group are meaningfully matched with one another; also known as dependent samples or matched pairs.

Parameter: A measure concerning a population (e.g., population mean).

Percentile: Proportion of a distribution less than a given value.

Pie chart

Graphical representation for categorical data in which a circle is partitioned into “slices” on the basis of the proportions of each category.

Pie Chart of Campus