Lesson 27: Correlation and Simple Regression

Overview Section

In this lesson, we investigate statistical analyses that are typically performed when dealing with two or more continuous numeric variables. Specifically, we investigate:

  • the GPLOT procedure, to create publication quality x-y scatter plots of any two numeric variables in a SAS data set
  • the CORR procedure, to compute various correlation coefficients between two or more numeric variables in a SAS data set
  • the REG procedure, to perform a regression analysis on any subset of numeric variables in a SAS data set

Objectives

Upon completion of this lesson, you should be able to:

Upon completing this lesson, you should be able to do the following:

  • use the CORR procedure to tell SAS to calculate Pearson correlation coefficients among a set of numeric variables
  • use the CORR procedure's SPEARMAN, KENDALL, and HOEFFDING options to tell SAS to calculate alternative coefficients
  • read typical correlation procedure output in order to be able to extract the calculated correlations and their associated P-values
  • use the CORR procedure's WITH statement to tell SAS to calculate only the correlation coefficients among the variables in the WITH and VAR statements
  • understand how sample size can affect the significance of a correlation coefficient
  • interpret a correlation coefficient
  • use the CORR procedure's PARTIAL statement to tell SAS to calculate partial correlations among variables
  • use the CORR procedure's BEST = n option to tell SAS to print only the first n of the ordered estimated correlations
  • use the REG procedure to compute a regression equation between two numeric variables
  • use the REG procedure's MODEL statement to tell SAS which variable to treat as the response variable and which variable to treat as the predictor variables
  • read the typical SAS output from regression analysis to pull off key information, such as parameter estimates, confidence intervals, and P-values
  • use the REG procedure's PLOT statement to request residual diagnostic plots
  • use the GPLOT procedure to request plots containing estimated regression equations, 95% confidence intervals about the mean of y, and 95% prediction intervals about the individual y-values
  • use the REG procedure to conduct a regression analysis involving quadratic terms
  • use the REG procedure to conduct a regression analysis involving transformed variables

Textbook Reference Section

 Chapter 5 of the textbook.