Information Retrieval

Assign to the course: here

Marks: link

Prerequisites:

  • Python 3.8 + Jupyter notebook
  • Java 12 (scripts should also work with 8+)
  • It is suggested to use own laptops.

Grading: link

Important note: Your solutions will be checked during the laboratories. However, you are also asked to send your codes one day before the laboratory meeting (just Java or Python files, i.e., do not send IDE projects, etc.). The reason is that your solutions will be occasionally checked for plagiarism. Some requirements:

  • When sending an email, do not forget to put the “[IR]” prefix in the title.
  • Remember to include your names, student IDs, and task no. in the message.
  • Send your scripts packed in a zip file named “ID1_ID2.zip” where ID1 and ID2 are your student IDs (or name it “ID1.zip”, i.e., put your student ID, if you are not working in a pair).

NoTopicAssignmentMaterials
1.Data collection: Web crawlingPythonFiles
2.Data extraction: Apache TikaJavaFiles
3.Text processing: Apache OpenNLPJavaFiles
4.Indexing + document representation + similarityFiles
5.Search engine: Apache LuceneJavaFiles
6.HITS + Page RankPythonFiles
7.Log AnalysisPython Files
8.Consultations