In the previous examples, we mostly considered problems associated with questions that measure opinion. We need to discern what we want to measure and how we want to measure it in a wider variety of circumstances. We must also think carefully about the properties of the measurements we gather.
Data is a collection of a number of pieces of information. Each specific piece of information is called an observation. The observations are measurements of certain characteristics which we call "variables". The word "variable" is used because the pieces of information, the observations, vary from one person to the next.
Figure 1.5: Types of Data
Example 1.11 Variables Section
Consider the following variables:
Table 1.2: Classification of Variables
Number | Variable | Type of Variable |
1 | Which are you? Near-sighted, far-sighted, neither | Categorical |
2 | What is your height? | Measurement and Continuous |
3 | How many phone calls did you make yesterday on a cell phone? | Measurement and Discrete |
4 | What is your cholesterol level? | Measurement and Continuous |
Hopefully, you find the classification of the first three variables easy to understand.
Variable #1 is a categorical variable because the possible choices are "words" or "categories."
Variable #2 is a measurement variable because the possible choices are "numbers." This variable is also called a continuous variable because it can assume a range of values on a continuum. You need an instrument, such as a tape measure or a ruler, to determine height. With measurement variables that are continuous, it is often necessary to use an instrument to determine the value of the variable. Measurement variables that are continuous can be subdivided into fractional parts (subdivided into smaller and smaller units of measurement). Typically, a continuous measurement variable is expressed as "an amount of " something.
Variable #3 is a measurement variable because the possible choices are numbers. It is also a discrete variable because one can simply count the number of phone calls made on a cell phone in any given day. The possible numbers are only integers such as 0, 1, 2, ... , 50, etc. (Some of you probably make a lot of cell phone calls.) Discrete measurement variables cannot be subdivided into smaller and smaller fractional parts (smaller and smaller units of measurement). Often, a discrete measurement variable is expressed as "a number of " something.
Variable #4 is somewhat ambiguous. Obviously what the variable is measuring (cholesterol levels) can be expressed on a continuum of possible values - but subjects are likely to round off or only know their levels as a discrete value. Cholesterol levels must be determined by a blood test where an instrument is used to determine the final value. The reported value represents the concentration of cholesterol in the blood. The appropriate units are milligrams per deciliter (mg/dL). What typically happens is that the value of the cholesterol level is rounded to the nearest whole number. Consequently, the cholesterol level might look like a discrete variable - but the raw values are continuous and, since the amount of "discreteness" is not great, a variable like this would be treated the same way as a continuous variable in any analyses.
Example 1.12 Best Way to Determine Heart Rate Section
Consider an experiment where heart rate (heart beats/minute) is measured by three different methods. Let's consider three different methods to determine heart rate.
Method 1: Count heart beats for 6 seconds & multiply by 10 to get heartbeats/minute
Method 2: Count heart beats for 30 seconds & multiply by 2 to get heartbeats/minute
Method 3: Count heart beats for 60 seconds
We collected six measurements on an individual for each of the three methods. These results are found in Table 1.3.
Table 1.3: Results from the Heart Rate Experiment
Method | Six Results | Heart Rate (HeartBeats/Minute) | Minimum and Maximum Heart Rate | Average Heart Rate |
---|---|---|---|---|
1 | 7, 7, 7, 7, 7, 7 | 70, 70, 70, 70, 70, 70 | 70, 70 | 70 |
2 | 36, 35, 37, 38, 37, 37 | 72, 70, 74, 76, 74, 74 | 70, 76 | 73 |
3 | 73, 76, 74, 75, 74, 75 | 73, 76, 74, 75, 74, 75 | 73, 76 | 74.5 |
In this example, we will not explore whether or not heart rate is a valid measure of overall health and fitness. Obviously, it does provide some information about whether or not a person may have some health problems. But by itself, it usually does not provide a complete picture. The questions that we pose now are the following:
Question 1: Which method is the most reliable?
Question 2: Which method is the most biased?
What may surprise you is that the answer to both questions is method 1. Method 1 is the most reliable because every time we took the measurement we observed 7 beats in 6 seconds. The results are consistent. Results from method 1 are also the most biased because it consistently underestimates the individual's true heart rate. If you look at the results from method 3, which is really the best method to determine heart rate, you find that the individual's average heart rate is 74.5 beats/minute. The results from method 1 always fell below this value. What this means is that even though method 1 is reliable, it still can have other problems, which in this case, is biasedness.
Example 1.13 Bias versus Reliability Section
Suppose you are interested in knowing whether the average price of homes in a certain county had gone up or down this year in comparison with last year. Would you be more interested in having a measure with low bias or a reliable measure of sales?
Ideally you would like the measure to be both unbiased and reliable. However, a reliable measure that is biased, can still often provide some meaningful information. Since the goal is to make a comparison of the average price of homes over two years, the measure must be reliable. So, even if the measure is biased, the amount of change from one year to the next may be sufficient information to make a comparison.