3.5 - R Scripts

Printer-friendly versionPrinter-friendly version

1) Acquire Data

Diabetes data

The diabetes data set is taken from the UCI machine learning database repository at: https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes.

  • 768 samples in the dataset
  • 8 quantitative variables
  • 2 classes; with or without signs of diabetes

Save the data into your working directory for this course as "diabetes.data." Then load data into R as follows:

# set the working directory
setwd("C:/STAT 897D data mining")
# comma delimited data and no header for each variable
RawData <- read.table("diabetes.data",sep = ",",header=FALSE)

In RawData, the response variable is its last column; and the remaining columns are the predictor variables.

responseY <- RawData[,dim(RawData)[2]]
predictorX <- RawData[,1:(dim(RawData)[2]-1)]

2) Fitting a Linear Model

In order to fit linear regression models in R, lm can be used for linear models, which are specified symbolically. A typical model takes the form of response ~ predictors where response is the (numeric) response vector and predictors is a series of predictor variables.

Take the full model and the base model (no predictors used) as examples:

fullModel <- lm(responseY~predictorX[,1]+predictorX[,2]
    +predictorX[,3]+predictorX[,4]+predictorX[,5]
    +predictorX[,6]+predictorX[,7]+predictorX[,8])

baseModel <- lm(responseY~1)

For the full model, \$coefficients shows the least square estimation for \(\hat{\beta}\) and \$fitted.values are the fitted values for the response variable.

fullModel\$coefficients
fullModel\$fitted.values

The results for the coefficients should be as follows:

     (Intercept) predictorX[, 1] predictorX[, 2] predictorX[, 3] predictorX[, 4] predictorX[, 5] 

  -0.8538942665    0.0205918715    0.0059202729   -0.0023318790    0.0001545198   -0.0001805345 

predictorX[, 6] predictorX[, 7] predictorX[, 8] 

   0.0132440315    0.1472374386    0.0026213938 

The fitted values should start with 0.6517572852.
After completing the reading for this lesson, please finish the Quiz and R Lab on ANGEL (check the course schedule for due dates).