CS 235: Data Mining Techniques
Data mining has emerged as one of the most exciting fields in Computer Science. Today many organizations and commercial enterprises have large online archives of data available, and these archives may contain unknown, yet useful, information. Data mining refers to a set of techniques that have been designed to find interesting pieces of information or knowledge in large amounts of data. There is currently a large commercial interest in the area, both for the development of data mining software and for the offering of consulting services on data mining, with a market for the former estimated at over 5 billion dollars.
In this course we explore how this interdisciplinary field brings together techniques from databases, statistics and machine learning. We will discuss the main data mining methods currently used, including clustering, classification, association rules mining, time series clustering, and web mining. Designing algorithms for these tasks is difficult because the input data sets are very large, and the tasks may be very complex. One of the main focuses in the field is to integrate these algorithms with relational databases, and to examine the additional complications that come up in this case.
About the Instructor
Eamonn Keogh’s research areas include data mining, machine learning and information retrieval, specializing in techniques for solving similarity and indexing problems in time-series datasets. He has authored more than 120 papers. He received the IEEE ICDM 2007 best paper award, SIGMOD 2001 best paper award, and runner up best paper award in KDD 1997. He has given over two dozen well received tutorials in the premier conferences in data mining and databases.