# 1(b) .1 - What is Data

Printer-friendly version

### Introduction

Anything that is observed or conceptualized falls under the purview of data. In a somewhat restricted view, data is something that can be measured. Data represent facts, or something that have actually taken place, observed and measured. Data may come out of passive observation or active collection. Each data point must be rooted in a physical, demographical or behavioural phenomenon, must be unambiguous and measurable. Data is observed on each unit under study and stored in an electronic device.

• Data denotes a collection of objects and their attributes.
• An attribute (feature, variable, or field) is a property or characteristic of an object.
• A collection of attributes describe an object (individual, entity, case, or record).

Often these attributes are referred to as variables. Attributes contain information regarding each unit of observation. Depending on how many different types of information are collected from each unit, the data may be univariate, bivariate or multivariate.

Data can have varied forms and structures but in one criterion they are all the same – data contains information and characteristics that separates one unit or observation from the others.

#### Types of Attributes

Nominal: Qualitative variables that do not have a natural order, e.g. Hair color, Religion, Residence zipcode of a student.

Ordinal: Qualitative variables that have a natural order, e.g. Grades, Rating of a service rendered on a scale of 1-5 (1 is terrible and 5 is excellent), Street numbers in New York City.

Interval: Measurements where the difference between two values is meaningful, e.g. Calendar dates, Temperature in Celsius or Fahrenheit.

Ratio: Measurements where both difference and ratio are meaningful, e.g. Temperature in Kelvin, Length, Counts.

#### Discrete and Continuous Attributes

Discrete Attribute

A variable or attribute is discrete if it can take a finite or a countably infinite set of values. A discrete variable is often represented as an integer-valued variable. A binary variable is a special case where the attribute can assume only two values, usually represented by 0 and 1. Examples of a discrete variable are the number of birds in a flock; the number of heads realized when a coin is flipped 10 times, etc.

Continuous Attribute

A variable or attribute is continuous if it can take any value in a given range; possibly the range being infinite. Examples of continuous variables are weights and heights of birds, temperature of a day, etc.

In the hierarchy of data, nominal is at the lowermost rank as it carries the least information. The highest type of data is ratio since it contains the maximum possible information. While analyzing the data, it has to be noted that procedures applicable for lower data type can be applied for a higher one, but the reverse is not true. Analysis procedure for nominal data can be applied to interval type data, but it is not recommended since such a procedure completely ignores the amount of information an interval type data carries. But the procedures developed for interval or even ratio type data cannot be applied to nominal nor to ordinal data. A prudent analyst should recognize each data type and then decide on the methods applicable.