1.1 - Types of Discrete Data

Objective 1.2Discrete data is often referred to as categorical data because of the way observations can be collected into categories. Variables producing such data can be of any of the following types:

Nominal (e.g., gender, ethnic background, religious or political affiliation)
Ordinal (e.g., extent of agreement, school letter grades)
Quantitative variables with relatively few values (e.g., number of times married)

Technically, a quantitative variable may take on any number of values and still be considered discrete, but it needs to be "countable". So, for example, the number of traffic accidents in a given time period may be considered discrete, but the amount of time between two consecutive accidents would be considered continuous. However, even a continuous variable may be used to produce discrete data if its range is divided or "coarsened" into intervals.

Note that many variables can be considered as either nominal or ordinal, depending on the purpose of the analysis. Consider majors in English, psychology, and computer science. This classification may be considered nominal or ordinal, depending on whether there is an intrinsic belief that it is "better" to have a major in computer science than in psychology or in English. Generally speaking, for a binary variable like pass/fail, ordinal or nominal consideration does not matter.

It should also be noted that numerically meaningful variables can be associated with any of the data types above, even the nominal type. For example, the gender categories of "man" and "woman" would themselves not be numerically meaningful, but if we let $X$ be the number of men in a random sample, that would be considered a quantitative (random) variable.

Context is important! The context of the study and the relevant questions of interest are important in specifying what kind of variable we will analyze.

Examples

Did you get the flu? (Yes or No) -- is a binary nominal categorical variable
What was the severity of your flu? (Low, Medium, or High) -- is an ordinal categorical variable

Measurement Hierarchy

The main distinction between nominal and ordinal data is that the latter has a natural ordering (least to greatest, best to worst, etc.), whereas the former does not. If the ordered characteristic is ignored, however, ordinal data could be considered a special case of nominal data. Similarly, discrete quantitative data could be considered a special case of ordinal data, with the additional characteristic that values have numerical meaning. So, computations like differences and averages make sense. Thus, the hierarchy is

nominal < ordinal < quantitative

In terms of analyses, methods applicable for one type of variable can be used for the variables at higher levels too (but not at lower levels). For example, methods designed for nominal data can be used for ordinal data but not vice versa. However, keep in mind that an analysis method may not be optimal if it ignores information available in the data.

One final note on the organization of these types is that quantitative variables may be further divided into "interval" and "ratio" types, depending on whether operations of subtraction and division make sense, but we will rarely need to make such distinction in this course.

Frequency Counts

While often not numerically meaningful originally, discrete data can be summarized with the frequency counts of individuals falling in the categories. If more than one variable is involved, counts can be measured either jointly or marginally for one variable by summing over categories of the other variable. Here are some examples.

Example: Eye Color

This is a typical frequency table for a single categorical variable. A sample of n = 96 persons is obtained, and the eye color of each person is recorded. The table then summarizes the responses by their frequencies.

Eye color	Count
Brown	46
Blue	22
Green	26
Other	2
Total	96

Analysis

Notice that brown, blue, green, and other have no intrinsic ordering. The response variable, eye color, is therefore an unordered categorical or nominal variable.

Example: Admissions Data

A university offers only two-degree programs: English and computer science. Admission is competitive, and there is suspicion of discrimination against women in the admission process. Here is a two-way table of counts of all applicants by sex and admission status. These data can be used to measure the association between the sex of the applicants and their success in obtaining admission.

	Admit	Deny	Total
Male	35	45	80
Female	20	40	60
Total	55	85	140

Analysis

In this case, the four counts in white represent numbers of joint events because combinations of both variables are considered, whereas the counts in gray are marginal counts for sex and admission status, respectively.

Example: Attitudes Towards War

Hypothetical attitudes of n = 116 people towards war. They were asked to state their opinion on a 5 point scale regarding the statement: "This is a necessary war".

Attitude	Count
Strongly disagree	35
Disagree	27
Agree	23
Strongly agree	31
Total	116

Analysis

The response categories in this example are clearly ordered, but no objectively defined numerical scores can be attached to the categories. The response variable, attitude, is therefore said to be an ordered categorical or an ordinal variable.

Example: Attitudes Towards War (cont.)

Working from the example above, suppose now that in addition to the four ordered categories, outcomes where the person wasn't sure or refused to answer were also recorded, giving n = 130 total counts divided up as follows.

Attitude	Count
Strongly disagree	35
Disagree	27
Agree	23
Strongly agree	31
Not sure	6
Refusal	8
Total	130

Analysis

The placement of "not sure" and "refusal" in the ordering is questionable. We would say that this response is partially ordered.

Example: Dice Rolls

Suppose a six-sided die is rolled 30 times, and the die face that comes up is recorded. One possible set of outcomes is tabulated below.

Face	Count
1	3
2	7
3	5
4	10
5	2
6	3
Total	30

Analysis

Note that the die faces are essentially labels and could reasonably be considered nominal or ordinal, depending on the context.

Example: Number of Children in Families

Here's an example where the response categories are numerically meaningful: the number of children in n = 100 randomly selected families.

Number of children	Count
0	19
1	26
2	29
3	13
4-5	11
6+	2
Total	100

Analysis

The original data has been coarsened into six categories (0, 1, 2, 3, 4–5, 6+). These categories are still ordered, but—unlike the previous example—the categories have objectively defined numeric values attached to them. We can say that this table represents coarsened numeric data.

Example: Household Incomes

The variable in this example is total gross income, recorded for a sample of n = 100 households.

Income	Count
below \$10,000	11
\$10,000–\$24,999	23
\$25,000–\$39,999	30
\$40,000–\$59,999	24
\$60,000 and above	12
Total	100

Analysis

The original data (raw incomes) were essentially continuous, but any type of data, continuous or discrete, can be grouped or coarsened into categories.

Grouping data will typically result in some loss of information. How much information is lost depends on

the number of categories and
the question being addressed.

In this example, grouping has somewhat diminished our ability to estimate the mean or median household income. Our ability to estimate the proportion of households with incomes below \$10,000 has not been affected, but estimating the proportion of households with incomes above \$75,000 is now virtually impossible.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility