1.1 - Types of Discrete Data

Objective 1.2Discrete data is often referred to as categorical data because of the way observations can be collected into categories. Variables producing such data can be of any of the following types:

  • Nominal (e.g., gender, ethnic background, religious or political affiliation)
  • Ordinal (e.g., extent of agreement, school letter grades)
  • Quantitative variables with relatively few values (e.g., number of times married)

Technically, a quantitative variable may take on any number of values and still be considered discrete, but it needs to be "countable". So, for example, the number of traffic accidents in a given time period may be considered discrete, but the amount of time between two consecutive accidents would be considered continuous. However, even a continuous variable may be used to produce discrete data if its range is divided or "coarsened" into intervals.

Note that many variables can be considered as either nominal or ordinal, depending on the purpose of the analysis. Consider majors in English, psychology, and computer science. This classification may be considered nominal or ordinal, depending on whether there is an intrinsic belief that it is "better" to have a major in computer science than in psychology or in English. Generally speaking, for a binary variable like pass/fail, ordinal or nominal consideration does not matter.

It should also be noted that numerically meaningful variables can be associated with any of the data types above, even the nominal type. For example, the gender categories of "man" and "woman" would themselves not be numerically meaningful, but if we let \(X\) be the number of men in a random sample, that would be considered a quantitative (random) variable.

Context is important! The context of the study and the relevant questions of interest are important in specifying what kind of variable we will analyze.

Examples

  1. Did you get the flu? (Yes or No) -- is a binary nominal categorical variable
  2. What was the severity of your flu? (Low, Medium, or High) -- is an ordinal categorical variable

Measurement Hierarchy Section

The main distinction between nominal and ordinal data is that the latter has a natural ordering (least to greatest, best to worst, etc.), whereas the former does not. If the ordered characteristic is ignored, however, ordinal data could be considered a special case of nominal data. Similarly, discrete quantitative data could be considered a special case of ordinal data, with the additional characteristic that values have numerical meaning. So, computations like differences and averages make sense. Thus, the hierarchy is

nominal < ordinal < quantitative

In terms of analyses, methods applicable for one type of variable can be used for the variables at higher levels too (but not at lower levels). For example, methods designed for nominal data can be used for ordinal data but not vice versa. However, keep in mind that an analysis method may not be optimal if it ignores information available in the data.

One final note on the organization of these types is that quantitative variables may be further divided into "interval" and "ratio" types, depending on whether operations of subtraction and division make sense, but we will rarely need to make such distinction in this course.

Frequency Counts Section

While often not numerically meaningful originally, discrete data can be summarized with the frequency counts of individuals falling in the categories. If more than one variable is involved, counts can be measured either jointly or marginally for one variable by summing over categories of the other variable. Here are some examples.

Example: Eye Color Section

Photo of an eye

This is a typical frequency table for a single categorical variable. A sample of n = 96 persons is obtained, and the eye color of each person is recorded. The table then summarizes the responses by their frequencies.

Eye color Count
Brown 46
Blue 22
Green 26
Other 2
Total 96
Analysis
Notice that brown, blue, green, and other have no intrinsic ordering. The response variable, eye color, is therefore an unordered categorical or nominal variable.

Example: Admissions Data Section

A university offers only two-degree programs: English and computer science. Admission is competitive, and there is suspicion of discrimination against women in the admission process. Here is a two-way table of counts of all applicants by sex and admission status. These data can be used to measure the association between the sex of the applicants and their success in obtaining admission.

  Admit Deny Total
Male 35 45 80
Female 20 40 60
Total 55 85 140
Analysis
In this case, the four counts in white represent numbers of joint events because combinations of both variables are considered, whereas the counts in gray are marginal counts for sex and admission status, respectively.

Example: Attitudes Towards War Section

Hypothetical attitudes of n = 116 people towards war. They were asked to state their opinion on a 5 point scale regarding the statement: "This is a necessary war".

Attitude Count
Strongly disagree 35
Disagree 27
Agree 23
Strongly agree 31
Total 116
Analysis
The response categories in this example are clearly ordered, but no objectively defined numerical scores can be attached to the categories. The response variable, attitude, is therefore said to be an ordered categorical or an ordinal variable.

Example: Attitudes Towards War (cont.) Section

Working from the example above, suppose now that in addition to the four ordered categories, outcomes where the person wasn't sure or refused to answer were also recorded, giving n = 130 total counts divided up as follows.

Attitude Count
Strongly disagree 35
Disagree 27
Agree 23
Strongly agree 31
Not sure 6
Refusal 8
Total 130
Analysis
The placement of "not sure" and "refusal" in the ordering is questionable. We would say that this response is partially ordered.

Example: Dice Rolls Section

Suppose a six-sided die is rolled 30 times, and the die face that comes up is recorded. One possible set of outcomes is tabulated below.

Face Count
1 3
2 7
3 5
4 10
5 2
6 3
Total 30
Analysis
Note that the die faces are essentially labels and could reasonably be considered nominal or ordinal, depending on the context.

Example: Number of Children in Families Section

Here's an example where the response categories are numerically meaningful: the number of children in n = 100 randomly selected families.

Number of children Count
0 19
1 26
2 29
3 13
4-5 11
6+ 2
Total 100
Analysis
The original data has been coarsened into six categories (0, 1, 2, 3, 4–5, 6+). These categories are still ordered, but—unlike the previous example—the categories have objectively defined numeric values attached to them. We can say that this table represents coarsened numeric data.

Example: Household Incomes Section

The variable in this example is total gross income, recorded for a sample of n = 100 households.

Income Count
below \$10,000 11
\$10,000–\$24,999 23
\$25,000–\$39,999 30
\$40,000–\$59,999 24
\$60,000 and above 12
Total 100
Analysis

The original data (raw incomes) were essentially continuous, but any type of data, continuous or discrete, can be grouped or coarsened into categories.

Grouping data will typically result in some loss of information. How much information is lost depends on

  • the number of categories and
  • the question being addressed.

In this example, grouping has somewhat diminished our ability to estimate the mean or median household income. Our ability to estimate the proportion of households with incomes below \$10,000 has not been affected, but estimating the proportion of households with incomes above \$75,000 is now virtually impossible.