Lesson 9: Manipulating Data and Repetitive Tasks

Lesson 9: Manipulating Data and Repetitive Tasks

Overview

Frequently we find that our data is not quite in the form we want it to be. Maybe we need just the means and SEs of some treatments, or we want to calculate a summary variable froms several others. One of the strengths of R is the ease with which data can be manipulated. In fact, I sometimes refer to this chapter as "things I used to do in Excel but are easier in R". Here we'll introduce several useful tools for manipulating data, using Chapter 9 of Essential R.

Objectives

Upon completion of this lesson, you should be able to:

• Extract summary statistics for subsets or groups of data defined by factors
• Change factor levels
• Calculate new variables from existing variables
• Stack and unstack data
• Merge two data frames

Data and R Code Files

The R code file and data files for this lesson can be found on the Essential R - Notes on learning R page.

9.1 - Summarizing Data – apply()

9.1 - Summarizing Data – apply()

In this video we will introduce the apply() family of functions which allow us to apply a function to parts of an array of data.

9.2 - Summarizing Data – tapply() and aggregate()

9.2 - Summarizing Data – tapply() and aggregate()

In the last video we saw how we could use apply() to extract column or row means (though colMeans() would work for that also). Here we'll explore tapply() and aggregate(), which can be used to apply a function to subsets of the data; for example to extract group means.

9.3 - Summarizing Data – Custom Functions for aggregate()

9.3 - Summarizing Data – Custom Functions for aggregate()

Here we'll extract the standard error of the mean for groups form the data to explore how we can define other functions in a call to aggregate() . Note that the function could equivalently be defined as a stand-alone function and called from within aggregate() - either way works.

9.4 - Sorting or Re-ordering Data

9.4 - Sorting or Re-ordering Data

Here we'll extract some group means and plot them. We'll then consider how we would re-order or sort a data frame, for example to change the order of bars in the plot.

9.5 - Stacking and Unstacking Data

9.5 - Stacking and Unstacking Data

In this video we will demonstrate how to "stack" and "unstack" data to move from "wide" to "long" formats or vice-versa.

9.6 - Data Manipulation Power Tools: Part i

9.6 - Data Manipulation Power Tools: Part i

In this screencast we will introduce the function reshape(), which permits extensive manipulation of data by "melting" and "casting" the data.

9.7 - Data Manipulation Power Tools: Part ii

9.7 - Data Manipulation Power Tools: Part ii

Here we continue our overview of the power of reshape().

9.8 - Data Manipulation Power Tools: Part iii

9.8 - Data Manipulation Power Tools: Part iii

We conclude our overview of reshape().

9.9 - Merging Two Datasets

9.9 - Merging Two Datasets

In this screencast we'll explore the function merge() which allows merging data from two dataframes (a bit like using vlookup in Excel, but much easier).

9.10 - A Bit More About Loops

9.10 - A Bit More About Loops

We wind up this chapter with a bit more detail on loops via an example where we use a loop to make three barplots of different variables. It still looks a bit rough, but a few conditional statments to change margins and disable the x-axis in all but the third plot and it would be very nice.

 [1] Link ↥ Has Tooltip/Popover Toggleable Visibility