##
Overview
Section* *

Exploratory Data Analysis (EDA) may also be described as *data-driven hypothesis generation*. Given a complex set of observations, often EDA provides the initial pointers towards various learning techniques. The data is examined for structures that may indicate deeper relationships among cases or variables.

In this lesson, we will focus on both aspects of EDA:

- Numerical summarization
- Data Visualization

This course is based on R software. There are several attractive features of R that make it a software of choice both in academia as well as in industry.

- R is an open-source software and is free to download.
- R is supported by 3,000+ packages to deal with large volumes of data in a wide variety of applications. For instance, the
`svd()`

function performs the singular value decomposition in a single line of coding, which cannot be so easily implemented in C, Java or Python. - R is quite versatile. After an algorithm is developed in R, the program may be sped up by transforming the R codes into other languages.
- R is a mainstream analytical tool.

Reference: * *

*The Popularity of Data Analysis Software*by R.A. Muenchen,*R You Ready for R?*by Ashlee Vance*R Programming for Data Science*by Roger Peng

The following diagram shows that in recent times R is gaining popularity as monthly programming discussion traffic shows explosive growth of discussions regarding R.

R has a vibrant user community. As a result of that R has the most website links that point to it.

R can be installed from the CRAN website R-Project following the instructions. Downloading R-Studio is strongly recommended. To develop familiarity with R it is suggested to follow through the material in Introduction to R. For further information refer to the Course Syllabus. Other useful websites on R are Stack Overflow R Questions and R Seek.

One of the objectives of this course is to strengthen the basics in R. The R-Labs given in the textbook are followed closely. Along with the material in the text, two other features in R are introduced.

- R Markdown: This allows the users to knit the R codes and outputs directly into the document.
- R library
`ggplot2`

: A very useful and sophisticated set of plotting functions to produce high-quality graphs

## Objectives

- Develop familiarity with R software.
- Application of numerical and visual summarization of data.
- Illustration of the importance of EDA before embarking on sophisticated model building.