Fall 2006
Dec 7, 2006:
Instructor: Manjunath, 805 893 7112
Data mining refers to tools and techniques for processing and managing large collections of data with the main objective being able to detect significant "patterns" or associations in such data sets. As such, it has a wide range of applications to problems in natural and social sciences, medicine, finance, and marketing. This introductory course will cover some of the basic principles of data mining with emphasis on data mining tasks and algorithms. These will include, for example, tools for classification and clustering, data structures for organizing high-dimensional data, association rules for mining, and retrieval by content.
Note that the lecture slides and project links require authentication as they contain copyright/limited access materials.
- Introduction to Data Mining, Tan, Steinbach and Kumar, Addison-Wesley, 2005. The author's web page has links to Chapter power point slides and sample chapters from the book. See http://www-users.cs.umn.edu/~kumar/dmbook/index.php.
- Data Mining: Concepts and Techniques by Han and Kamber, Morgan Kauffman, 2001. another well written book, from a database point of view. The author has slides (ppt) available on his web site for each of the chapters. http://www.cs.sfu.ca/~han/dmbook.
- Note: the second edition of this book was released earlier this year. check out http://www-faculty.cs.uiuc.edu/~hanj/bk2/
H/W + Class participation/discussions: 25%; Project: 40%; Final exam (take home): 35%.
Paper Presentations Project Proposals: Oct 19: Retinal Detachment (Joshi, Mangiat, Ni and Sargin) Oct 24: Netflix project (Ranu,k Singh and Sakarya), Delay Testing Project (CY Chen)
Oct 26: Social Networks (P Wu, S Wu, and WY Chen) Oct 31: Cortina Project (De Guzman, Moxley and XU), Folksonomy project (Petko and Imran), PersonalAlbums (Yeh, Zhu) Nov 02: Multisensor EEG (Choi, Kleban, Sarkar, Rahimi), Atomic Motifs (Sturm, Gauglitz)
Slides01 (09/28/2006): Introduction (~4MB)
Slides02 (10/02/2006): Data (~4MB) Classification Methods Slides03 (10/17/2006): Decision Trees (~2MB) Slides04:(11/09/2006): Other classification methods (~2MB)
Association Mining
Slides05: (11/09/2006): Association mining methods (~3MB) (FINAL)
Clustering Methods
Slides06: (12/02/2006): Clustering (~3MB)
Final Notes (12/07/2006)
HW#1: Due on Oct 17. (solutions)
HW #2: Due on Oct 26. (solutions)
HW #3: Due on Nov 9.
HW #4: Due Nov 21. (solutions)