11-10-2017 The new semester has begun :)

The aim and the scope of the course

The aim of the course: To get to know the latest technologies and algorithms for mining of massive datasets.

The scope of the course: We will learn about MapReduce and scalable algorithms for:

The course is mainly based on parts of the Mining of Massive Datasets book.

Main information about the course

Time and place

Schedule of lectures

11-10-2017 Mining massive data sets [pdf]
18-10-2017 Classification and regression I [pdf]
25-10-2017 Classification and regression II [pdf]
08-11-2017 Classification and regression III [pdf]

Schedule of labs

11-10-2017 Bonferroni's principle [pdf]
18-10-2017 Solving problems by simulations[pdf]
25-10-2017 Classification and regression - Naive Bayes I [pdf]
08-11-2017 Classification and regression - Naive Bayes II [pdf] [code]
15-11-2017 Classification and regression - Naive Bayes III [pdf]


Test : 75% (min. 50%)
Labs : 25% (min. 50%)
Regular tasks and home works : 100% (min. 50%)

Bonus points for all: up to 10 percent points.


90% 5.0
80% 4.5
70% 4.0
60% 3.5
50% 3.0


J. Leskovec, A. Rajaraman, J. D. Ullman, Mining of Massive Datasets, Cambridge University Press, 2014,

H. Garcia-Molina, J. D. Ullman, J. Widom, Database Systems: The Complete Book. Second Edition. Pearson Prentice Hall, 2009.

J.Lin, Ch. Dyer, Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers, 2010,

T. Hastie, R. Tibshirani, J. Friedman, Elements of Statistical Learning: Second Edition. Springer, 2009,

Ch. Lam, Hadoop in Action, Manning Publications Co., 2011.