# Lesson 10: Multivariate Data

Lesson 10: Multivariate Data## Overview

So far we've worked with two variables at a time, but often we have more - sometimes many more. Here we'll introduce several useful tools for working with multivariate data, using Chapter 10 of EssentialR. Note that this is a very brief overview, and that we won't discuss many multivariate tools such as PCA, ordination, and others.

## Objectives

- Make 3-way frequency tables
- Make pairs plots using the function
`pairs()`

- Use lattice graphics to make plots conditioned on a third variable
- Carry out Principal Components Analysis (PCA)
- Carry out heirarchical clustering and k-means clustering

## R

## Data and R Code Files

The R code file and data files for this lesson can be found on the Essential R - Notes on learning R page.

# 10.1 - Multiple Variables

10.1 - Multiple VariablesIn this screencast we'll demonstrate the use of `table()`

for 3-way frequency tables and the use of `pairs()`

to create correlation plot matrices, which are very useful in exploratory analysis.

# 10.2 - Lattice Graphics

10.2 - Lattice GraphicsHere we'll introduce some functions from the package 'lattice' which allow making groups of plots, with groups defined by a variable.

# 10.3 - An Example with Data Import, pairs(), and by()

10.3 - An Example with Data Import, pairs(), and by()Here I work through a brief example beginning with importing data, checking for correlation with `pairs()`

, and demonstrate using `by()`

to extract some group means. Note that here I didn't exclude one of the factor variables, and you can see how it is displayed in the last row & column of the scatterplot matrix.

# 10.4 - Principal Components Analysis

10.4 - Principal Components AnalysisHere we will demonstrate Principal Components Analysis, or PCA, which can be a useful way to get some idea of which viariables are contributing the most variability to a data set. Note that the biplot may be a bit small to easily see in the "plot" pane. If you are following along in R ckick the "zoom" button aobve the plot pane to see a larger version.

# 10.5 - Heirarchical Clustering and Dendrograms

10.5 - Heirarchical Clustering and DendrogramsHierarchical clustering groups observations by finding those that are "nearest" each other and so defining clusters. While there are different ways to define "nearest" and different ways to define clusters, the idea is the same. Here we work with the root anatomy data, and it seems like the sample locations (L1 vs L2) are a bit more clustered than the genotypes.

**NOTE!**The current version of EssentialR uses a modified form of the data, so the genotypes in this data file are now named A:L and the sample locations are L1 and L2 vs 5-8 and 20-28.

# 10.6 - K-means Clustering

10.6 - K-means Clusteringk-means clustering is a clustering method that looks for k clusters in the data, meaning we must tell it how many groups to look for. Nonetheless it can still be very useful. Here we ask how well the three species of iris in the iris dataset can be separated based on their morphology (as captured by the 4 quantitative variables in the dataset).

## STAT 485: Topics in R Statistical Language

This course will continue into STAT 485: Topics in R Statistical Language