1(b).1 - What is Data


Anything that is observed or conceptualized falls under the purview of data. In a somewhat restricted view, data is something that can be measured. Data represent facts or something that has actually taken place, observed and measured. Data may come out of passive observation or active collection. Each data point must be rooted in a physical, demographical or behavioral phenomenon must be unambiguous and measurable. Data is observed in each unit under study and stored in an electronic device.

denotes a collection of objects and their attributes
(feature, variable, or field) is a property or characteristic of an object
Collection of Attributes
describe an object (individual, entity, case, or record)
Each Row is an Object and each Column is an Attribute
ID Sex Education Income
248 Male High School $100,000
249 Female High School $12,000
250 Male College $23,000
251 Male Child $0
252 Female High School $19,798
253 Male High School $40,100
254 Male Less than 1st Grade $2691
255 Male Child $0
256 Male 11th Grade $30,000
257 Male Ph.D. $30686



Often these attributes are referred to as variables. Attributes contain information regarding each unit of observation. Depending on how many different types of information are collected from each unit, the data may be univariate, bivariate or multivariate.

Data can have varied forms and structures but in one criterion they are all the same – data contains information and characteristics that separate one unit or observation from the others.

Types of Attributes

Qualitative variables that do not have a natural order, e.g. Hair color, Religion, Residence zipcode of a student
Qualitative variables that have a natural order, e.g. Grades, Rating of a service rendered on a scale of 1-5 (1 is terrible and 5 is excellent), Street numbers in New York City
Measurements where the difference between two values is meaningful, e.g. Calendar dates, Temperature in Celsius or Fahrenheit
Measurements where both difference and ratio are meaningful, e.g. Temperature in Kelvin, Length, Counts

Discrete and Continuous Attributes

Discrete Attribute
A variable or attribute is discrete if it can take a finite or a countably infinite set of values. A discrete variable is often represented as an integer-valued variable. A binary variable is a special case where the attribute can assume only two values, usually represented by 0 and 1. Examples of a discrete variable are the number of birds in a flock; the number of heads realized when a coin is flipped 10 times, etc.
Continuous Attribute
A variable or attribute is continuous if it can take any value in a given range; possibly the range being infinite. Examples of continuous variables are weights and heights of birds, the temperature of a day, etc.

In the hierarchy of data, nominal is at the lowermost rank as it carries the least information. The highest type of data is ratio since it contains the maximum possible information. While analyzing the data, it has to be noted that procedures applicable to a lower data type can be applied for a higher one, but the reverse is not true. Analysis procedure for nominal data can be applied to interval type data, but it is not recommended since such a procedure completely ignores the amount of information an interval type data carries. But the procedures developed for interval or even ratio type data cannot be applied to nominal nor to ordinal data. A prudent analyst should recognize each data type and then decide on the methods applicable.