9  Manipulating Data and Repetitive Tasks

Overview

Frequently we find that our data is not quite in the form we want it to be. Maybe we need just the means and SEs of some treatments, or we want to calculate a summary variable froms several others. One of the strengths of R is the ease with which data can be manipulated. In fact, I sometimes refer to this chapter as “things I used to do in Excel but are easier in R”. Here we’ll introduce several useful tools for manipulating data, using Chapter 9 of Essential R Course Notes.

Objectives

Upon completion of this lesson, you should be able to:


  1. extract summary statistics for subsets or groups of data defined by factors,
  2. change factor levels,
  3. calculate new variables from existing variables,
  4. stack and unstack data, and
  5. merge two data frames.

Data and R Code Files

The R code file and data files for this lesson can be found on the Essential R - Notes on learning R page.

R logo

9.1 Summarizing Data - apply()

In this video we will introduce the apply() family of functions which allow us to apply a function to parts of an array of data.

Video - STAT 484 Lesson: 9.1

9.2 Summarizing Data - tapply() and aggregate()

In the last video we saw how we could use apply() to extract column or row means (though colMeans() would work for that also). Here we’ll explore tapply() and aggregate(), which can be used to apply a function to subsets of the data; for example to extract group means.

Video - STAT 484 Lesson: 9.2

9.3 Summarizing Data - Custom Functions for aggregate()

Here we’ll extract the standard error of the mean for groups form the data to explore how we can define other functions in a call to aggregate(). Note that the function could equivalently be defined as a stand-alone function and called from within aggregate() - either way works.

Video - STAT 484 Lesson: 9.3

9.4 Sorting or Re-ordering Data

Here we’ll extract some group means and plot them. We’ll then consider how we would re-order or sort a data frame - for example, to change the order of bars in the plot.

Video - STAT 484 Lesson: 9.4

9.5 Stacking and Unstacking Data

In this video we will demonstrate how to “stack” and “unstack” data to move from “wide” to “long” formats or vice-versa.

Video - STAT 484 Lesson: 9.5

9.6 Data Manipulation Power Tools: Part I

In this screencast we will introduce the function reshape(), which permits extensive manipulation of data by “melting” and “casting” the data.

Video - STAT 484 Lesson: 9.6

9.7 Data Manipulation Power Tools: Part II

Here we continue our overview of the power of reshape().

Video - STAT 484 Lesson: 9.7

9.8 Data Manipulation Power Tools: Part III

We conclude our overview of reshape().

Video - STAT 484 Lesson: 9.8

9.9 Merging Two Datasets

In this video, we’ll explore the function merge() which allows merging data from two dataframes (a bit like using vlookup in Excel, but much easier).

Video - STAT 484 Lesson: 9.9

9.10 A Bit More About Loops

We wind up this chapter with a bit more detail on loops via an example where we use a loop to make three barplots of different variables. It still looks a bit rough, but a few conditional statments to change margins and disable the x-axis in all but the third plot and it would be very nice.

Video - STAT 484 Lesson: 9.10