dr hab. inż. Krzysztof Dembczyński (kdembczynski cs put poznan pl)
Thursday, 10:00-12:00, office room 2 CW (Institute of Computing Science)
05-09-2019 | The last chance to pass the course: Thursday, September 12, 11:00-12:00, office room 2 CW (Instiute of Computing Science). |
29-06-2019 | The last chance for students below 50 points is on Thursday, July 4, 10:00-12:00, office room 2 CW (Instiute of Computing Science). |
29-06-2019 | The sheets below with scores have been updated. Please verify your final score. |
19-06-2019 | A sheet with (current) final scores is available here. |
A sheet with scores from the first quiz is available here. | |
A sheet with current scores from labs is available here. | |
28-05-2019 | The first quiz will take place on Saturday, June 15, 15:10, room CW 8 |
The second quiz will take place on Saturday, June 29, 9:00, room CW 13 | |
The last lab meeting will take place on Saturday, June 29, 14:00, room CW 45 | |
Quiz questions for practicing can be found here. | |
23-02-2019 | The new semester has finally begun :) |
The aim of the course: To get to know technologies and algorithms for processing massive datasets.
The scope of the course: We will learn how to organize, store, access, and process massive datasets:
The course is mainly based on the first 4 chapters of the Mining of Massive Datasets book.
23-02-2019 | Introduction to massive datasets [pdf] |
02-03-2019 | Processing of massive datasets [pdf] |
02-03-2019 | Distributed processing and MapReduce [pdf] |
30-03-2019 | MapReduce in Spark [pdf] |
30-03-2019 | Approximate query processing [pdf] |
25-05-2019 | Finding similar items I [pdf] |
25-05-2019 | Finding similar items II [pdf] |
02-03-2019 | Dimensional modeling and data transformation in bash [pdf] [unique_tracks.zip] [triplets_sample_20p.zip] |
30-03-2019 | MapReduce in Spark [pdf] [all-shakespeare.zip] [matrix M] [vector x] [vector v] [data for matrix multiplication] |
12-05-2019 | Hash functions and Bloom filters [pdf] [facts-nns.csv.gz] |
15-06-2019 | Nearest neighbor search [pdf] [facts-nns.csv.gz] [nearestneighbors-100-2018.result] |
Test : | 75% | (min. 50%) |
Labs : | 25% | (min. 50%) |
Regular exercises and homeworks : | 100% | (min. 50%) |
Bonus points for all: up to 10 points.
Scale:90% | 5.0 |
80% | 4.5 |
70% | 4.0 |
60% | 3.5 |
50% | 3.0 |
J. Leskovec, A. Rajaraman, J. D. Ullman, Mining of Massive Datasets, Cambridge University Press, 2014, http://www.mmds.org.
R. Kimball, M. Ross, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, John Wiley & Sons, 2002
H. Garcia-Molina, J. D. Ullman, J. Widom, Database Systems: The Complete Book. Second Edition. Pearson Prentice Hall, 2009.