8 Importing Data
Overview
So far in our lessons, we’ve mostly entered our data directly into R or we’ve used built-in data sets that are part of R. This won’t take us very far for real work; to really use R for data analysis, we must get data into R! We’ll use Chapter 8 in Essential R Course Notes.
Objectives
- prepare a text file (.txt or .csv) for import into R,
- import the data using the function
read.table()
, and - recognize and fix the most common problems that cause errors when importing data.
Data and R Code Files
The R code file and data files for this lesson can be found on the Essential R - Notes on learning R page.
The following data files should be in the “Data” folder in your “Essential R” folder; or you can save them to your working directory.
W101-2010.xls (Excel file) and W101-2010.csv (text file made from the Excel file). The example text files are: Ex1.txt, Ex2.txt, Ex3.txt, and Ex3.csv. The data file required for Exercise 2 is here: StatesData.xls.
8.1 Overview of Importing Text Files
Here we’ll introduce the most straightforward way to import data - basically:
- Open the data in a spreadsheet to clean it up:
- Copy and paste only the data into a fresh worksheet
- Fix column names (no spaces or special characters, and short is good)
- Check data types - numeric variables should contain only numbers
- Save the data from a spreadsheet as a .csv file
- Import data to R using
read.csv()
- Repeat steps 1 and 2 until step 3 works
8.2 Preparing a Spreadsheet for Import as Text
Here we’ll demonstrate the typical steps in creating a “clean” text file from a spreadsheet. It is worth noting the value of good variable names at this point. The balance between unambiguous and short is a call you have to make. You’ll be typing them a lot, so short has real value, but you should be able to remember what they mean as well.
8.3 File Paths and the Working Directory
In order to import a file we have to be able to tell R where the file is. Here we’ll describe how file locations can be specified, either as full paths, relative to the working directory, or interactive file choice using file.choose()
. Note that interactive file choice will not work with compiled documents. Also note that from here on out the course notes assume that your working directory is set to “Essential R”.
8.4 Common Data Import Problems: Part i
Here we will practice importing data from some small text files, paying special attention to the many things that can go wrong.
8.5 Common Data Import Problems: Part ii
Here we’ll continue our exploration of typical errors in the data import process.