dr hab. inż. Krzysztof Dembczyński (kdembczynski cs put poznan pl)
mgr inż. Kalina Jasinska (kjasinska cs put poznan pl)
mgr inż. Marek Wydmuch (mwydmuch cs put poznan pl)
29-01-2019 | The quiz repetition will take place on Wednesday, 14:00, room 13CW. |
29-01-2019 | The results from the first quiz with the final evaluation can be found here. |
29-01-2019 | Last labs with Krzysztof Dembczyński will take place on Wednesday, January 30, 12:00 (the time has changed!!!), lab 43. |
15-01-2019 | Final quiz will take place on Monday, January 21, 13:30, room 2CW (Quiz repetition on Wednesday, January 30). |
Remember that 1/20 is not 0.2 :) | |
14-01-2019 | Quiz questions for practicing can be found here. |
11-01-2019 | Additional lecture preparing for the final quiz will take place on Tuesday, January 15, 18:30 room L122BT. |
28-11-2018 | Additional lecture will take place on Tuesday, December 18, 18:30 room L122BT. |
26-11-2018 | Because of the NIPS conference the lecture and all labs are cancelled for the next week (from 03-12-2018 to 07-12-2018). |
29-10-2018 | A sheet with current scores is available here. |
23-10-2018 | The next lecture has been moved from Monday, October 29, to Tuesday, October 30, 18:30, room L122BT. |
17-10-2018 | Office hours cancelled on Thursday, October 18, because of the PP-RAI conference (please email me if you want to meet). |
08-10-2018 | The new semester has finally begun :) |
The aim of the course: To get to know technologies and algorithms for processing massive datasets.
The scope of the course: We will learn how to organize, store, access, and process massive datasets:
The course is based on parts of the Mining of Massive Datasets book.
08-10-2018 | Processing of massive data sets [pdf] |
15-10-2018 | Evolution of database systems [pdf] |
22-10-2018 | Dimensional modeling [pdf] |
29-10-2018 | Processing of massive data sets I [pdf] |
05-11-2018 | Processing of massive data sets II [pdf] |
19-11-2018 | MapReduce in Spark I [pdf] |
26-11-2018 | MapReduce in Spark II [pdf] |
10-12-2018 | Approximate query processing I [pdf] |
17-12-2018 | Approximate query processing II [pdf] |
18-12-2018 | Finding similar items I [pdf] |
07-01-2019 | Finding similar items II [pdf] |
14-01-2019 | Finding similar items III [pdf] |
08-10-2018 | Bonferroni's principle [pdf] |
15-10-2018 | Solving problems by simulations [pdf] |
22-10-2018 | Data transformation [pdf] [unique_tracks.zip] [triplets_sample_20p.zip] [docker example] |
29-10-2018 | Dimensional modeling [pdf] |
05-11-2018 | Data transformation in bash [pdf] |
19-11-2018 | MapReduce in Spark I [pdf] [all-shakespeare.zip] [matrix M] [vector x] [vector v] |
26-11-2018 | MapReduce in Spark II [pdf] [data for matrix multiplication] |
26-11-2018 | MapReduce in Spark III - A bonus exercise [pdf] |
10-12-2018 | Bloom filters [pdf] [code] |
17-12-2018 | Nearest neighbor search [pdf] [facts-nns.csv.gz] [nearestneighbors-100-2018.result] |
07-01-2019 | Minhash signatures [pdf] |
14-01-2019 | Approximate nearest neighbor search [pdf] |
Test : | 75% | (min. 50%) |
Labs : | 25% | (min. 50%) |
Regular exercises and homeworks : | 100% | (min. 50%) |
Bonus points for all: up to 10 points.
A sheet with current scores is available here
Scale:90% | 5.0 |
80% | 4.5 |
70% | 4.0 |
60% | 3.5 |
50% | 3.0 |
J. Leskovec, A. Rajaraman, J. D. Ullman, Mining of Massive Datasets, Cambridge University Press, 2014, http://www.mmds.org.
R. Kimball, M. Ross, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, John Wiley & Sons, 2002
H. Garcia-Molina, J. D. Ullman, J. Widom, Database Systems: The Complete Book. Second Edition. Pearson Prentice Hall, 2009.
J.Lin, Ch. Dyer, Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers, 2010, http://lintool.github.com/MapReduceAlgorithms/.