02-03-2018 The new semester has begun :)

The aim and the scope of the course

The aim of the course: To get to know the latest technologies and algorithms for processing massive datasets for intelligent decision support systems.

The scope of the course: We will learn how to organize, store, access, and process massive datasets::

Information about the Course

Time and Place

Schedule of Lectures

02-03-2018 Processing of massive data sets [pdf]
16-03-2018 Evolution of database systems [pdf]
23-03-2018 Dimensional modeling [pdf]
06-04-2018 ETL and OLAP systems [pdf]
13-04-2018 MapReduce in Spark I [pdf]
20-04-2018 MapReduce in Spark II [pdf]
27-04-2018 Processing of very large data [pdf]
11-05-2018 Approximate query processing I [pdf]
18-05-2018 Approximate query processing II [pdf]
25-05-2018 Multi-dimensional index structures [pdf]
08-06-2018 Finding similar items I [pdf]
15-06-2018 Finding similar items II [pdf]

Schedule of Labs

02-03-2018 Bonferroni's principle [pdf]
16-03-2018 Solving problems by simulations [pdf]
23-03-2017 Data transformation [pdf] [] [] [report.pdf] [report.tex]
06-04-2018 Dimensional modeling [pdf]
13-04-2018 Data transformation in bash [pdf]
20-04-2018 MapReduce in Spark I [pdf] [] [matrix M] [vector x] [vector v]
27-04-2018 MapReduce in Spark II [pdf] [data for matrix multiplication]
11-05-2018 Implementation of the bit-sliced index [pdf]
18-05-2018 Bloom filters [pdf] [code]
25-05-2018 Probabilistic data structures for distinct count [pdf] [code] [data]
08-06-2018 Nearest neighbor search [pdf] [] [result.txt]
02-06-2017 Approximate nearest neighbor search [pdf]


Test : 75% (min. 50%)
Labs : 25% (min. 50%)
Regular exercises and homeworks : 100% (min. 50%)

Bonus points for all: up to 10 points.


90% 5.0
80% 4.5
70% 4.0
60% 3.5
50% 3.0


J. Leskovec, A. Rajaraman, J. D. Ullman, Mining of Massive Datasets, Cambridge University Press, 2014,

R. Kimball, M. Ross, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, John Wiley & Sons, 2002

H. Garcia-Molina, J. D. Ullman, J. Widom, Database Systems: The Complete Book. Second Edition. Pearson Prentice Hall, 2009.

J.Lin, Ch. Dyer, Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers, 2010,

Ch. Lam, Hadoop in Action, Manning Publications Co., 2011.