Lesson 8: Importing DataLesson 8: Importing Data
So far in our lessons, we've mostly entered our data direclty into R or used built-in data sets that are part of R. This won't take us very far for real work; to really use R for data analysis we must get data into R! We'll use Chapter 8 in Essential R.
- Prepare a text file (.txt or .csv) for import into R
- Import the data using the function
- Recognize and fix the most common problems that cause errors when importing data
Data and R Code Files
The R code file and data files for this lesson can be found on the Essential R - Notes on learning R page.
The following data files should be in the "Data" folder in your "Essential R" folder; or you can save them to your working directory.
W101-2010.xls (Excel file) and W101-2010.csv (text file made from the Excel file). The example text files are: Ex1.txt, Ex2.txt, Ex3.txt, and Ex3.csv. The data file required for Exercise 2 is here: StatesData.xls.
readr. This replaces
read_csv(). This will read in your data but may convert some of the variables differently than
read.csv()would do. (
read_csv()will create a "tibble", which is subtly different from a data.frame). My suggestion to avoid the confusion is to edit the code that rstudio wrote in your console - replace
read.csvand rerun the code - you should be fine.
8.1 - Overview of Importing Text Files8.1 - Overview of Importing Text Files
Here we'll introduce the most straighforward way to import data - basically:
- Open the data in a spreadsheet to clean it up:
- Copy and paste only the data into a fresh worksheet
- Fix column names (no spaces or special characters, and short is good)
- Check data types - numeric variables should contain only numbers
- Save the data from a spreadsheet as a .csv file
- Import data to R using
- Repeat steps 1 and 2 until step 3 works
8.2 - Preparing a Spreadsheet for Import as Text8.2 - Preparing a Spreadsheet for Import as Text
Here we'll demonstrate the typical steps in creating a "clean" text file from a spreadsheet. It is worth noting the value of good variable names at this point. The balance between unambiguous and short is a call you have to make. You'll be typing them a lot, so short has real value, but you should be able to remember what they mean also.
8.3 - File Paths and the Working Directory8.3 - File Paths and the Working Directory
In order to import a file we have to be able to tell R where the file is. Here we'll describe how file locations can be specified, either as full paths, relative to the working directory, or interactive file choice using
file.choose(). Note that interactive file choice will not work with compiled documents. Also note that from here on out the course notes assume that your working directory is set to "Essential R".
8.4 - Common Data Import Problems: Part i8.4 - Common Data Import Problems: Part i
Here we will practice importing data from some small text files, paying special attention to the many things that can go wrong.
8.5 - Common Data Import Problems: Part ii8.5 - Common Data Import Problems: Part ii
Here we'll continue our exploration of typical errors in the data import process.
read.table()can also import data from ftp and http servers, but (sadly) not from https servers.
- The "Import Dataset" icon in the "Environment" tab in RStudio - there is a nice tool here that lets you choose a file and set it up for import - it even writes the line of code into the console, so you can copy it to your editor. Note that this tool uses
read_table()(from the package "readr") and not
read.table(). In general, this is not a problem (and is even better for extremely large data sets) as long as you remember that
read_table()does not automatically convert strings to factors (like
read.table()does). This means that you need to either:
- Convert your factors manually after import
- Specify data type for those columns as "factor" in the GUI
- Copy the code into the editor and modify it to use