08-02-2017 Table with final scores [results]
08-02-2017 Table with lab scores [results]
08-02-2017 Table with test scores [results]
30-11-2016 This week we will have a joint lab meeting for both groups in a lecture room, L125 BT, on Friday, December 2, 13:30
27-10-2016 Lecture and labs are canceled on Friday, October 28, 2016 (because of injury :( )
12-10-2016 The new semester has begun :)

The aim and the scope of the course

The aim of the course: To get to know the latest technologies and algorithms for mining of massive datasets.

The scope of the course: We will learn about MapReduce and scalable algorithms for:

The course is mainly based on parts of the Mining of Massive Datasets book.

Main information about the course

Time and place

Schedule of lectures

12-10-2016 Mining massive data sets [pdf]
14-10-2016 Evolution of database systems [pdf]
19-10-2016 MapReduce I [pdf]
26-10-2016 MapReduce II [pdf]
28-10-2016 Canceled - injury :(
02-11-2016 Classification and regression I [pdf]
16-11-2016 Classification and regression II [pdf]
23-11-2016 Classification and regression III [pdf]
30-11-2016 Classification and regression IV [pdf]
07-12-2016 Classification and regression V [pdf]
14-12-2016 Multi-dimensional Index Structures [pdf]
21-12-2016 Finding Similar Items I [pdf]
17-01-2017 Finding Similar Items II [pdf]
25-01-2017 Recommendation Systems I [pdf]
27-01-2017 Recommendation Systems II [pdf]

Schedule of labs

14-10-2016 Bonferroni's principle [pdf]
22-10-2016 Dimensional modeling and data transformation [pdf] [] [] [report-1.pdf] [report-1.tex]
28-10-2016 Canceled - injury :(
04-11-2016 MapReduce in Hadoop [pdf] [data-all-bible]
18-11-2016 MapReduce - first programs [pdf] [eclipse project] [source code templates][data-all-shakespeare]
25-11-2016 MapReduce - matrix multiplication [pdf] [code and data]
02-12-2016 Classification and regression - Weka API [pdf] [code] [data]
09-12-2016 Classification and regression - naive Bayes classifier [pdf]
16-12-2016 Classification and Regression - Testing Classifiers [pdf] [data] [code]
23-12-2017 Classification and Regression - Cross-Validation [pdf] [code]
13-01-2017 Neareast Neighbor Search [pdf] [] [result.txt]
14-01-2016 Approximate Neareast Neighbor Search [pdf]

Virtual machine with Hadoop

Virtual machine containing a single-node Apache Hadoop cluster - link to Cloudera:


Test : 75% (min. 50%)
Labs : 25% (min. 50%)
Regular tasks and home works : 100% (min. 50%)

Bonus points for all: up to 10 percent points.


90% 5.0
80% 4.5
70% 4.0
60% 3.5
50% 3.0


J. Leskovec, A. Rajaraman, J. D. Ullman, Mining of Massive Datasets, Cambridge University Press, 2014,

H. Garcia-Molina, J. D. Ullman, J. Widom, Database Systems: The Complete Book. Second Edition. Pearson Prentice Hall, 2009.

J.Lin, Ch. Dyer, Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers, 2010,

T. Hastie, R. Tibshirani, J. Friedman, Elements of Statistical Learning: Second Edition. Springer, 2009,

Ch. Lam, Hadoop in Action, Manning Publications Co., 2011.