Course: Data Mining, MSc in Mathematics
Data Mining (6 CFU)
Semester: Spring


Overview

The course provides a modern introduction to data mining, which spans techniques, algorithms and methodologies for discovering structure, patterns and relationships in data sets (typically, large ones) and making predictions. Applications of data mining are already happening all around us, and, when they are done well, sometimes they even go unnoticed. For instance, how does the Google web search work? How does Shazam recognizes a song? How does Netflix recommend movies to its users? The principles of data mining provide answers to these and others questions. Data mining overlaps the fields of computer science, statistical machine learning and data bases. The course aims at providing the students with the knowldedge required to explore, analyze and leverage available data in order to turn the data into valuable and actionable information for a company, for instance, in order to facilitate a decision-making process.


Learning outcomes

After the course the student should be able to:

• describe and use the main data mining techniques;
• understand the differences among several algorithms solving the same problem and recognize which one is better under different conditions;
• tackle new data mining problems by selecting the appropriate methods and justifying his/her choices;
• tackle new data mining problems by designing suitable algorithms and evaluating the results;
• explaining experimental results to people outside of statistical machine learning or computer science.


Course Content


Introduction. Map-Reduce (2 hours) Mining data streams. Frequent Items. (6 hours) Frequent Itemsets and association rules. (4 hours) Mining similar items and Locality-Sensitive Hashing. (2 hours) Graph analysis. Link analysis and PageRank. (2 hours) Clustering. (4 hours) Recommendation systems. (4 hours) Mining Social-Network Graphs. (4 hours) Dimensionality reduction. (2 hours) Classification. (6 hours) Drills: (6 hours)


Prerequisite

Calculus. Probability and Statistics. Linear Algebra. Programming skills.


Examination

Oral exam. During the exam the student is asked to illustrate theoretical topics in order to verify his/her knowledge and understanding of the selected topics.


Office Hours
By appointment; contact the instructor by email or at the end of class meetings.


References

Data Mining and Analysis
M. J. Zaki and W. Meira
Freely available online: http://dataminingbook.info

Mining of Massive Datasets 
J. Leskovec, A. Rajaraman and J. Ullman
Freely available online: http://www.mmds.org