dr hab. inż. Krzysztof Dembczyński (kdembczynski cs put poznan pl)
mgr inż. Kalina Jasinska (kjasinska cs put poznan pl)
mgr inż. Marek Wydmuch (mwydmuch cs put poznan pl)
| 29-01-2019 | The quiz repetition will take place on Wednesday, 14:00, room 13CW. |
| 29-01-2019 | The results from the first quiz with the final evaluation can be found here. |
| 29-01-2019 | Last labs with Krzysztof Dembczyński will take place on Wednesday, January 30, 12:00 (the time has changed!!!), lab 43. |
| 15-01-2019 | Final quiz will take place on Monday, January 21, 13:30, room 2CW (Quiz repetition on Wednesday, January 30). |
| Remember that 1/20 is not 0.2 :) | |
| 14-01-2019 | Quiz questions for practicing can be found here. |
| 11-01-2019 | Additional lecture preparing for the final quiz will take place on Tuesday, January 15, 18:30 room L122BT. |
| 28-11-2018 | Additional lecture will take place on Tuesday, December 18, 18:30 room L122BT. |
| 26-11-2018 | Because of the NIPS conference the lecture and all labs are cancelled for the next week (from 03-12-2018 to 07-12-2018). |
| 29-10-2018 | A sheet with current scores is available here. |
| 23-10-2018 | The next lecture has been moved from Monday, October 29, to Tuesday, October 30, 18:30, room L122BT. |
| 17-10-2018 | Office hours cancelled on Thursday, October 18, because of the PP-RAI conference (please email me if you want to meet). |
| 08-10-2018 | The new semester has finally begun :) |
The aim of the course: To get to know technologies and algorithms for processing massive datasets.
The scope of the course: We will learn how to organize, store, access, and process massive datasets:
The course is based on parts of the Mining of Massive Datasets book.
| 08-10-2018 | Processing of massive data sets [pdf] |
| 15-10-2018 | Evolution of database systems [pdf] |
| 22-10-2018 | Dimensional modeling [pdf] |
| 29-10-2018 | Processing of massive data sets I [pdf] |
| 05-11-2018 | Processing of massive data sets II [pdf] |
| 19-11-2018 | MapReduce in Spark I [pdf] |
| 26-11-2018 | MapReduce in Spark II [pdf] |
| 10-12-2018 | Approximate query processing I [pdf] |
| 17-12-2018 | Approximate query processing II [pdf] |
| 18-12-2018 | Finding similar items I [pdf] |
| 07-01-2019 | Finding similar items II [pdf] |
| 14-01-2019 | Finding similar items III [pdf] |
| 08-10-2018 | Bonferroni's principle [pdf] |
| 15-10-2018 | Solving problems by simulations [pdf] |
| 22-10-2018 | Data transformation [pdf] [unique_tracks.zip] [triplets_sample_20p.zip] [docker example] |
| 29-10-2018 | Dimensional modeling [pdf] |
| 05-11-2018 | Data transformation in bash [pdf] |
| 19-11-2018 | MapReduce in Spark I [pdf] [all-shakespeare.zip] [matrix M] [vector x] [vector v] |
| 26-11-2018 | MapReduce in Spark II [pdf] [data for matrix multiplication] |
| 26-11-2018 | MapReduce in Spark III - A bonus exercise [pdf] |
| 10-12-2018 | Bloom filters [pdf] [code] |
| 17-12-2018 | Nearest neighbor search [pdf] [facts-nns.csv.gz] [nearestneighbors-100-2018.result] |
| 07-01-2019 | Minhash signatures [pdf] |
| 14-01-2019 | Approximate nearest neighbor search [pdf] |
| Test : | 75% | (min. 50%) |
| Labs : | 25% | (min. 50%) |
| Regular exercises and homeworks : | 100% | (min. 50%) |
Bonus points for all: up to 10 points.
A sheet with current scores is available here
Scale:| 90% | 5.0 |
| 80% | 4.5 |
| 70% | 4.0 |
| 60% | 3.5 |
| 50% | 3.0 |
J. Leskovec, A. Rajaraman, J. D. Ullman, Mining of Massive Datasets, Cambridge University Press, 2014, http://www.mmds.org.
R. Kimball, M. Ross, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, John Wiley & Sons, 2002
H. Garcia-Molina, J. D. Ullman, J. Widom, Database Systems: The Complete Book. Second Edition. Pearson Prentice Hall, 2009.
J.Lin, Ch. Dyer, Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers, 2010, http://lintool.github.com/MapReduceAlgorithms/.