1(a).1 - What is Data Mining?

Data Mining refers to a set of methods applicable to large and complex databases to eliminate the randomness and discover the hidden pattern. Data mining methods are almost always computationally intensive.  Data mining is about tools, methodologies, and theories for revealing patterns in data — which is a critical step in knowledge discovery. There are several driving forces for why data mining has become such an important area of study.

  1. The explosive growth of data in a great variety of fields in industry and academia supported by:
    • Cheaper storage devices with unlimited capacities, such as cloud storage
    • Faster communication with faster connection speeds;
    • Better database management systems and software support
  2. Rapidly increasing computing power.

With such a high volume of varied data available, data mining techniques help to extract information out of the data.

Statistical learning methods include everything, starting with linear regression, and encompassing recently developed complex and computation-intensive pattern recognition methods with roots in computer science. The main objective of learning methods is prediction, though that need not be the only objective. In this course though only prediction methods are considered.