dr hab. inż. Krzysztof Dembczyński (kdembczynski cs put poznan pl)
17-01-2019 | The final exam will take place on Monday 13:00, January 28, room L122 BT. |
The last lab meeting will take place on Monday 17:00, January 28, lab 43. | |
19-11-2018 | A sheet with current scores is available here. |
08-01-2019 | We swap two lectures with the course "Enterprise distributed systems": |
- First change: 22-11-2018, 13:30 swapped with 13-12-2018, 11:45 | |
- Second change: 06-12-2018, 13:30 swapped with 20-12-2018, 11:45 | |
19-11-2018 | We swap two lectures with the course "Enterprise distributed systems": |
31-10-2018 | The lab tasks from today are mandatory for both groups, however, they will not be evaluated during the next meeting. |
Instead, the tasks will be extended during the next meeting and the final evaluation will concern both meetings. | |
17-10-2018 | Office hours cancelled on Thursday, October 18, because of the PP-RAI conference (please email me if you want to meet). |
04-10-2018 | The new semester has begun :) |
The aim of the course: To get to know the latest technologies and algorithms for mining of massive datasets.
The scope of the course: We will learn about scalable algorithms for:
The course is mainly based on parts of the Mining of Massive Datasets book.
04-10-2018 | Mining massive data sets [pdf] |
11-10-2018 | Classification and regression I [pdf] |
17-10-2018 | Classification and regression II [pdf] |
08-11-2018 | Classification and regression III [pdf] |
15-11-2018 | Classification and regression IV [pdf] |
29-11-2018 | Classification and regression V [pdf] |
13-12-2018 | Evolution of database systems [pdf] |
13-12-2018 | MapReduce [pdf] |
20-12-2018 | MapReduce in Spark [pdf] |
10-01-2019 | Finding Similar Items I [pdf] |
16-01-2019 | Finding Similar Items II [pdf] |
17-01-2019 | Finding Similar Items III [pdf] |
10-10-2018 | Bonferroni's principle [pdf] |
18-10-2018 | Solving problems by simulations [pdf] |
25-10-2018 | Classification and regression - Introduction to scikit-learn [pdf] |
31-10-2018 | Classification and regression - Naive Bayes I [pdf] [code] |
07-11-2018 | Classification and regression - Naive Bayes II [pdf] |
14-11-2018 | Classification and regression - Testing classifiers [pdf] [code] |
29-11-2018 | Classification and regression - Decision boundary [pdf] |
06-12-2018 | Classification and Regression - Cross-Validation [pdf] [code] |
13-12-2018 | Processing of massive datasets [pdf] [unique_tracks.zip] [triplets_sample_20p.zip] |
19-12-2018 | Multidimensional modeling and data transformation in bash [pdf] |
20-12-2018 | MapReduce in Spark I [pdf] [all-shakespeare.zip] [matrix M] [vector x] [vector v] |
03-01-2018 | MapReduce in Spark II [pdf] [data for matrix multiplication] |
10-01-2018 | Exact and approximate neareast neighbor search [pdf] [facts-nns.csv.gz] [result.txt] |
Test : | 75% | (min. 50%) |
Labs : | 25% | (min. 50%) |
Regular tasks and home works : | 100% | (min. 50%) |
Bonus points for all: up to 10 percent points.
A sheet with current scores is available here.
Scale:90% | 5.0 |
80% | 4.5 |
70% | 4.0 |
60% | 3.5 |
50% | 3.0 |
J. Leskovec, A. Rajaraman, J. D. Ullman, Mining of Massive Datasets, Cambridge University Press, 2014, http://infolab.stanford.edu/~ullman/mmds.html.
H. Garcia-Molina, J. D. Ullman, J. Widom, Database Systems: The Complete Book. Second Edition. Pearson Prentice Hall, 2009.
J.Lin, Ch. Dyer, Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers, 2010, http://lintool.github.com/MapReduceAlgorithms/.
T. Hastie, R. Tibshirani, J. Friedman, Elements of Statistical Learning: Second Edition. Springer, 2009, http://www-stat.stanford.edu/~tibs/ElemStatLearn/.
Ch. Lam, Hadoop in Action, Manning Publications Co., 2011.