9 Manipulating Data and Repetitive Tasks
Overview
Frequently we find that our data is not quite in the form we want it to be. Maybe we need just the means and SEs of some treatments, or we want to calculate a summary variable froms several others. One of the strengths of R is the ease with which data can be manipulated. In fact, I sometimes refer to this chapter as “things I used to do in Excel but are easier in R”. Here we’ll introduce several useful tools for manipulating data, using Chapter 9 of Essential R Course Notes.
Objectives
Upon completion of this lesson, you should be able to:
- extract summary statistics for subsets or groups of data defined by factors,
- change factor levels,
- calculate new variables from existing variables,
- stack and unstack data, and
- merge two data frames.
Data and R Code Files
The R code file and data files for this lesson can be found on the Essential R - Notes on learning R page.
9.1 Summarizing Data - apply()
In this video we will introduce the apply()
family of functions which allow us to apply a function to parts of an array of data.
9.2 Summarizing Data - tapply()
and aggregate()
In the last video we saw how we could use apply()
to extract column or row means (though colMeans()
would work for that also). Here we’ll explore tapply()
and aggregate()
, which can be used to apply a function to subsets of the data; for example to extract group means.
9.3 Summarizing Data - Custom Functions for aggregate()
Here we’ll extract the standard error of the mean for groups form the data to explore how we can define other functions in a call to aggregate()
. Note that the function could equivalently be defined as a stand-alone function and called from within aggregate()
- either way works.
9.4 Sorting or Re-ordering Data
Here we’ll extract some group means and plot them. We’ll then consider how we would re-order or sort a data frame - for example, to change the order of bars in the plot.
9.5 Stacking and Unstacking Data
In this video we will demonstrate how to “stack” and “unstack” data to move from “wide” to “long” formats or vice-versa.
9.6 Data Manipulation Power Tools: Part I
In this screencast we will introduce the function reshape()
, which permits extensive manipulation of data by “melting” and “casting” the data.
9.7 Data Manipulation Power Tools: Part II
Here we continue our overview of the power of reshape()
.
9.8 Data Manipulation Power Tools: Part III
We conclude our overview of reshape()
.
9.9 Merging Two Datasets
In this video, we’ll explore the function merge()
which allows merging data from two dataframes (a bit like using vlookup
in Excel, but much easier).
9.10 A Bit More About Loops
We wind up this chapter with a bit more detail on loops via an example where we use a loop to make three barplots of different variables. It still looks a bit rough, but a few conditional statments to change margins and disable the x-axis in all but the third plot and it would be very nice.