Lesson 8: Importing Data

Lesson 8: Importing Data

Overview

So far in our lessons, we've mostly entered our data direclty into R or used built-in data sets that are part of R. This won't take us very far for real work; to really use R for data analysis we must get data into R! We'll use Chapter 8 in Essential R.

Objectives

Upon completion of this lesson, you should be able to:

  • Prepare a text file (.txt or .csv) for import into R
  • Import the data using the function read.table()
  • Recognize and fix the most common problems that cause errors when importing data

R

Data and R Code Files

The R code file and data files for this lesson can be found on the Essential R - Notes on learning R page.

The following data files should be in the "Data" folder in your "Essential R" folder; or you can save them to your working directory.

W101-2010.xls (Excel file) and W101-2010.csv (text file made from the Excel file). The example text files are: Ex1.txt, Ex2.txt, Ex3.txt, and Ex3.csv. The data file required for Exercise 2 is here: StatesData.xls.

Note! The videos don't cover the use of the "Import Dataset" dialog in the Environment browser pane in RStudio, but it is discussed in Chapter 8 of EssentialR. Do note that the current version of RStudio have updated the import data dialog to use the package readr. This replaces read.csv() with read_csv(). This will read in your data but may convert some of the variables differently than read.csv() would do. (read_csv() will create a "tibble", which is subtly different from a data.frame). My suggestion to avoid the confusion is to edit the code that rstudio wrote in your console - replace read_csv() with read.csv and rerun the code - you should be fine.

8.1 - Overview of Importing Text Files

8.1 - Overview of Importing Text Files

Here we'll introduce the most straighforward way to import data - basically:

  1. Open the data in a spreadsheet to clean it up:
    • Copy and paste only the data into a fresh worksheet
    • Fix column names (no spaces or special characters, and short is good)
    • Check data types - numeric variables should contain only numbers
  2. Save the data from a spreadsheet as a .csv file
  3. Import data to R using read.csv()
  4. Repeat steps 1 and 2 until step 3 works

8.2 - Preparing a Spreadsheet for Import as Text

8.2 - Preparing a Spreadsheet for Import as Text

Here we'll demonstrate the typical steps in creating a "clean" text file from a spreadsheet. It is worth noting the value of good variable names at this point. The balance between unambiguous and short is a call you have to make. You'll be typing them a lot, so short has real value, but you should be able to remember what they mean also.


8.3 - File Paths and the Working Directory

8.3 - File Paths and the Working Directory

In order to import a file we have to be able to tell R where the file is. Here we'll describe how file locations can be specified, either as full paths, relative to the working directory, or interactive file choice using file.choose(). Note that interactive file choice will not work with compiled documents. Also note that from here on out the course notes assume that your working directory is set to "Essential R".


8.4 - Common Data Import Problems: Part i

8.4 - Common Data Import Problems: Part i

Here we will practice importing data from some small text files, paying special attention to the many things that can go wrong.


8.5 - Common Data Import Problems: Part ii

8.5 - Common Data Import Problems: Part ii

Here we'll continue our exploration of typical errors in the data import process.

NOTE!
  1. read.table() can also import data from ftp and http servers, but (sadly) not from https servers.
  2. The "Import Dataset" icon in the "Environment" tab in RStudio - there is a nice tool here that lets you choose a file and set it up for import - it even writes the line of code into the console, so you can copy it to your editor. Note that this tool uses read_table()(from the package "readr") and not read.table(). In general, this is not a problem (and is even better for extremely large data sets) as long as you remember that read_table() does not automatically convert strings to factors (like read.table() does). This means that you need to either:
    1. Convert your factors manually after import
    2. Specify data type for those columns as "factor" in the GUI
    3. Copy the code into the editor and modify it to use read.table().

Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility