Lesson 9: Manipulating Data and Repetitive TasksLesson 9: Manipulating Data and Repetitive Tasks
Frequently we find that our data is not quite in the form we want it to be. Maybe we need just the means and SEs of some treatments, or we want to calculate a summary variable froms several others. One of the strengths of R is the ease with which data can be manipulated. In fact, I sometimes refer to this chapter as "things I used to do in Excel but are easier in R". Here we'll introduce several useful tools for manipulating data, using Chapter 9 of Essential R.
- Extract summary statistics for subsets or groups of data defined by factors
- Change factor levels
- Calculate new variables from existing variables
- Stack and unstack data
- Merge two data frames
Data and R Code Files
The R code file and data files for this lesson can be found on the Essential R - Notes on learning R page.
9.1 - Summarizing Data – apply()9.1 - Summarizing Data – apply()
In this video we will introduce the
apply() family of functions which allow us to apply a function to parts of an array of data.
9.2 - Summarizing Data – tapply() and aggregate()9.2 - Summarizing Data – tapply() and aggregate()
In the last video we saw how we could use
apply() to extract column or row means (though
colMeans() would work for that also). Here we'll explore
aggregate(), which can be used to apply a function to subsets of the data; for example to extract group means.
9.3 - Summarizing Data – Custom Functions for aggregate()9.3 - Summarizing Data – Custom Functions for aggregate()
Here we'll extract the standard error of the mean for groups form the data to explore how we can define other functions in a call to
aggregate() . Note that the function could equivalently be defined as a stand-alone function and called from within
aggregate() - either way works.
9.4 - Sorting or Re-ordering Data9.4 - Sorting or Re-ordering Data
Here we'll extract some group means and plot them. We'll then consider how we would re-order or sort a data frame, for example to change the order of bars in the plot.
9.5 - Stacking and Unstacking Data9.5 - Stacking and Unstacking Data
In this video we will demonstrate how to "stack" and "unstack" data to move from "wide" to "long" formats or vice-versa.
9.6 - Data Manipulation Power Tools: Part i9.6 - Data Manipulation Power Tools: Part i
In this screencast we will introduce the function
reshape(), which permits extensive manipulation of data by "melting" and "casting" the data.
9.7 - Data Manipulation Power Tools: Part ii9.7 - Data Manipulation Power Tools: Part ii
Here we continue our overview of the power of
9.8 - Data Manipulation Power Tools: Part iii9.8 - Data Manipulation Power Tools: Part iii
We conclude our overview of
9.9 - Merging Two Datasets9.9 - Merging Two Datasets
In this screencast we'll explore the function
merge() which allows merging data from two dataframes (a bit like using
vlookup in Excel, but much easier).
9.10 - A Bit More About Loops9.10 - A Bit More About Loops
We wind up this chapter with a bit more detail on loops via an example where we use a loop to make three barplots of different variables. It still looks a bit rough, but a few conditional statments to change margins and disable the x-axis in all but the third plot and it would be very nice.