dr hab. inż. Krzysztof Dembczyński (kdembczynski cs put poznan pl)
|17-01-2019||The final exam will take place on Monday 13:00, January 28, room L122 BT.|
|The last lab meeting will take place on Monday 17:00, January 28, lab 43.|
|19-11-2018||A sheet with current scores is available here.|
|08-01-2019||We swap two lectures with the course "Enterprise distributed systems":|
|- First change: 22-11-2018, 13:30 swapped with 13-12-2018, 11:45|
|- Second change: 06-12-2018, 13:30 swapped with 20-12-2018, 11:45|
|19-11-2018||We swap two lectures with the course "Enterprise distributed systems":|
|31-10-2018||The lab tasks from today are mandatory for both groups, however, they will not be evaluated during the next meeting.|
|Instead, the tasks will be extended during the next meeting and the final evaluation will concern both meetings.|
|17-10-2018||Office hours cancelled on Thursday, October 18, because of the PP-RAI conference (please email me if you want to meet).|
|04-10-2018||The new semester has begun :)|
The aim of the course: To get to know the latest technologies and algorithms for mining of massive datasets.
The scope of the course: We will learn about scalable algorithms for:
The course is mainly based on parts of the Mining of Massive Datasets book.
|04-10-2018||Mining massive data sets [pdf]|
|11-10-2018||Classification and regression I [pdf]|
|17-10-2018||Classification and regression II [pdf]|
|08-11-2018||Classification and regression III [pdf]|
|15-11-2018||Classification and regression IV [pdf]|
|29-11-2018||Classification and regression V [pdf]|
|13-12-2018||Evolution of database systems [pdf]|
|20-12-2018||MapReduce in Spark [pdf]|
|10-01-2019||Finding Similar Items I [pdf]|
|16-01-2019||Finding Similar Items II [pdf]|
|17-01-2019||Finding Similar Items III [pdf]|
|10-10-2018||Bonferroni's principle [pdf]|
|18-10-2018||Solving problems by simulations [pdf]|
|25-10-2018||Classification and regression - Introduction to scikit-learn [pdf]|
|31-10-2018||Classification and regression - Naive Bayes I [pdf] [code]|
|07-11-2018||Classification and regression - Naive Bayes II [pdf]|
|14-11-2018||Classification and regression - Testing classifiers [pdf] [code]|
|29-11-2018||Classification and regression - Decision boundary [pdf]|
|06-12-2018||Classification and Regression - Cross-Validation [pdf] [code]|
|13-12-2018||Processing of massive datasets [pdf] [unique_tracks.zip] [triplets_sample_20p.zip]|
|19-12-2018||Multidimensional modeling and data transformation in bash [pdf]|
|20-12-2018||MapReduce in Spark I [pdf] [all-shakespeare.zip] [matrix M] [vector x] [vector v]|
|03-01-2018||MapReduce in Spark II [pdf] [data for matrix multiplication]|
|10-01-2018||Exact and approximate neareast neighbor search [pdf] [facts-nns.csv.gz] [result.txt]|
|Test :||75%||(min. 50%)|
|Labs :||25%||(min. 50%)|
|Regular tasks and home works :||100%||(min. 50%)|
Bonus points for all: up to 10 percent points.
A sheet with current scores is available here.Scale:
J. Leskovec, A. Rajaraman, J. D. Ullman, Mining of Massive Datasets, Cambridge University Press, 2014, http://infolab.stanford.edu/~ullman/mmds.html.
H. Garcia-Molina, J. D. Ullman, J. Widom, Database Systems: The Complete Book. Second Edition. Pearson Prentice Hall, 2009.
J.Lin, Ch. Dyer, Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers, 2010, http://lintool.github.com/MapReduceAlgorithms/.
T. Hastie, R. Tibshirani, J. Friedman, Elements of Statistical Learning: Second Edition. Springer, 2009, http://www-stat.stanford.edu/~tibs/ElemStatLearn/.
Ch. Lam, Hadoop in Action, Manning Publications Co., 2011.