Lesson 1(a): Introduction to Data Mining
|Key Learning Goals for this Lesson:|
Textbook reading: Consult Course Schedule
With rapid advances in information technology, an explosive growth is witnessed in data generation and data collection capabilities across all domains. In the business world, very large databases on commercial transactions have been generated by retailers and ecommerce. Huge amount of scientific data have been generated in various fields as well. One case in point is the human genome project which has aggregated gigabytes of data on the human genetic code. The World Wide Web provides another example with billions of web pages consisting of textual and multimedia information that are used by millions of people. Analyzing huge bodies of data that can be understood and used efficiently remains a challenging problem. Data mining addresses this problem by providing techniques and software to automate the analysis and exploration of large and complex data sets. Research on data mining is being pursued in a wide variety of fields, including statistics, computer science, machine learning, database management and data visualization, to name a few.
This course on data mining will cover commonly used techniques and applications in this field. Though the focus is on the application of the methods through the software R, considerable effort is devoted to develop the mathematical basis. Data mining and learning techniques developed in fields other than statistics, e.g., machine learning and signal processing, are also introduced. After the completion of the course, students should be able to identify situations concerning applicability of the techniques, employ the techniques to derive results, interpret the results and comprehend the limitations, if any, of the final outcome.