6 Working with Two Variables
Overview
In this lesson, we’ll start working with more than one variable, using Chapter 6 in Essential R Course Notes, and it will (hopefully) start to feel more like we are actually doing statistics. Note that while we’ll introduce plotting functions a bit more here, we’re just scratching the surface - in later chapters we’ll go into plotting in much greater depth.
Objectives
Upon completion of this lesson, you should be able to:
- make frequency table and carry out chi-squared test for two factors,
- make barplots for two factors,
- explore correlation between two numeric variables,
- fit a regression between two numeric variable, and
- compare group means for a continuous variable over levels of a factor.
Data and R Code Files
As always, you can access these files in the “Code Files” folders available from the Essential R Course Notes, or here: Chapter 6 R Script
6.1 Two Factors: Frequency Tables
In this video, we’ll demonstrate frequency tables and proportion tables for analyzing the relationship between two factors (qualitative or categorical variables).
6.2 Two Factors: Chi-squared Tests
Here we’ll build on the last video by showing how a frequency table can be used to calculate a test for independence between two variables.
6.3 Two Factors: Barplots and Mosaic Plots
Here we will consider a couple of ways to visualize the relationship between two factors.
6.4 Two Numeric Variables: Scatterplot
Here we’ll begin with visualizing the relationship between two continuous variables.
6.5 Two Numeric Variables: Correlation
Now we’ll introduce the function cor()
for correlations, and show how to derive both Pearson and Spearman correlations, and how to specify how missing values should be treated.
6.6 Two Numeric Variables: Regression
Now that we’ve explored correlation, we’ll take a brief look at using the “linear model” function lm()
to fit regressions. We’ll also look at how we can examine the residuals (stored inside the “lm object” created by lm()
) to see if their distribution is approximately normal. Note that we’ll we explore regression in much more detail in a later chapter.
6.7 Two Numeric Variables: Regression Diagnostics
Here we’ll briefly introduce the built-in regression diagnostics available by calling plot()
on an lm object. The residual plot and the normal q-q plots are the most helpful.
6.8 Comparing Group Means: T-tests and One-way Tests
We’ll wind up this chapter by demonstrating a comparison of group means using t-tests and one-way tests of means, and ANOVA.