dr hab. inż. Krzysztof Dembczyński (kdembczynski cs put poznan pl)
| 17-01-2019 | The final exam will take place on Monday 13:00, January 28, room L122 BT. |
| The last lab meeting will take place on Monday 17:00, January 28, lab 43. | |
| 19-11-2018 | A sheet with current scores is available here. |
| 08-01-2019 | We swap two lectures with the course "Enterprise distributed systems": |
| - First change: 22-11-2018, 13:30 swapped with 13-12-2018, 11:45 | |
| - Second change: 06-12-2018, 13:30 swapped with 20-12-2018, 11:45 | |
| 19-11-2018 | We swap two lectures with the course "Enterprise distributed systems": |
| 31-10-2018 | The lab tasks from today are mandatory for both groups, however, they will not be evaluated during the next meeting. |
| Instead, the tasks will be extended during the next meeting and the final evaluation will concern both meetings. | |
| 17-10-2018 | Office hours cancelled on Thursday, October 18, because of the PP-RAI conference (please email me if you want to meet). |
| 04-10-2018 | The new semester has begun :) |
The aim of the course: To get to know the latest technologies and algorithms for mining of massive datasets.
The scope of the course: We will learn about scalable algorithms for:
The course is mainly based on parts of the Mining of Massive Datasets book.
| 04-10-2018 | Mining massive data sets [pdf] |
| 11-10-2018 | Classification and regression I [pdf] |
| 17-10-2018 | Classification and regression II [pdf] |
| 08-11-2018 | Classification and regression III [pdf] |
| 15-11-2018 | Classification and regression IV [pdf] |
| 29-11-2018 | Classification and regression V [pdf] |
| 13-12-2018 | Evolution of database systems [pdf] |
| 13-12-2018 | MapReduce [pdf] |
| 20-12-2018 | MapReduce in Spark [pdf] |
| 10-01-2019 | Finding Similar Items I [pdf] |
| 16-01-2019 | Finding Similar Items II [pdf] |
| 17-01-2019 | Finding Similar Items III [pdf] |
| 10-10-2018 | Bonferroni's principle [pdf] |
| 18-10-2018 | Solving problems by simulations [pdf] |
| 25-10-2018 | Classification and regression - Introduction to scikit-learn [pdf] |
| 31-10-2018 | Classification and regression - Naive Bayes I [pdf] [code] |
| 07-11-2018 | Classification and regression - Naive Bayes II [pdf] |
| 14-11-2018 | Classification and regression - Testing classifiers [pdf] [code] |
| 29-11-2018 | Classification and regression - Decision boundary [pdf] |
| 06-12-2018 | Classification and Regression - Cross-Validation [pdf] [code] |
| 13-12-2018 | Processing of massive datasets [pdf] [unique_tracks.zip] [triplets_sample_20p.zip] |
| 19-12-2018 | Multidimensional modeling and data transformation in bash [pdf] |
| 20-12-2018 | MapReduce in Spark I [pdf] [all-shakespeare.zip] [matrix M] [vector x] [vector v] |
| 03-01-2018 | MapReduce in Spark II [pdf] [data for matrix multiplication] |
| 10-01-2018 | Exact and approximate neareast neighbor search [pdf] [facts-nns.csv.gz] [result.txt] |
| Test : | 75% | (min. 50%) |
| Labs : | 25% | (min. 50%) |
| Regular tasks and home works : | 100% | (min. 50%) |
Bonus points for all: up to 10 percent points.
A sheet with current scores is available here.
Scale:| 90% | 5.0 |
| 80% | 4.5 |
| 70% | 4.0 |
| 60% | 3.5 |
| 50% | 3.0 |
J. Leskovec, A. Rajaraman, J. D. Ullman, Mining of Massive Datasets, Cambridge University Press, 2014, http://infolab.stanford.edu/~ullman/mmds.html.
H. Garcia-Molina, J. D. Ullman, J. Widom, Database Systems: The Complete Book. Second Edition. Pearson Prentice Hall, 2009.
J.Lin, Ch. Dyer, Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers, 2010, http://lintool.github.com/MapReduceAlgorithms/.
T. Hastie, R. Tibshirani, J. Friedman, Elements of Statistical Learning: Second Edition. Springer, 2009, http://www-stat.stanford.edu/~tibs/ElemStatLearn/.
Ch. Lam, Hadoop in Action, Manning Publications Co., 2011.