Summary of the tutorial

Extreme Multi-label Classification (XMLC) refers to supervised multi-label learning in which instances are labeled with a few relevant labels from a very large set, consisting of potentially millions of all possible target labels. It has been shown in recent works that, apart from automatic tagging, this framework can be leveraged to effectively address problems in ranking, recommendation systems and web-advertising. In this tutorial we will motivate extreme classification as an active and rapidly growing research area with many potential applications in information retrieval and introduce the research challenges. We will also present a detailed review of the three main strands for addressing these challenges which include (i) label embedding methods, (ii) tree-based methods, and (iii) smart one-vs-rest approaches. Finally, we will highlight a dozen open benchmark datasets derived from sources such as Wikipedia, Amazon and Delicious, and also live demonstrate open source code by running it on these benchmark datasets.



The slides from the tutorial can be found here.


  R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In WWW, May 2013.
  Alex Auvolat, Hugo Larochelle, Sarath Chandar, Pascal Vincent, and Yoshua Bengio. Clustering is efficient for approximate maximum inner product search. CoRR, abs/1507.05910, 2015.
  Rohit Babbar and Bernhard Schölkopf. Dismec - distributed sparse machines for extreme multi-label classification. Web Search and Data Mining, 2017.
  S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multi-class tasks. In NIPS, pages 163-171. Curran Associates, Inc., 2010.
  A. Beygelzimer, J. Langford, Y. Lifshits, G. B. Sorkin, and A. L. Strehl. Conditional probability tree estimation analysis and algorithms. In UAI, pages 51-58, 2009.
  A. Beygelzimer, J. Langford, and P. D. Ravikumar. Error-correcting tournaments. In ALT, pages 247-262, 2009.
  Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. Sparse local embeddings for extreme multi-label classification. In NIPS, 2015.
  L. Bottou. Large-scale machine learning with stochastic gradient descent. In Yves Lechevallier and Gilbert Saporta, editors, COMPSTAT, pages 177-187, Paris, France, August 2010. Springer.
  Yao-Nan Chen and Hsuan-Tien Lin. Feature-aware label space dimension reduction for multi-label classification. In NIPS, pages 1529-1537. Curran Associates, Inc., 2012.
  Anna Choromanska and John Langford. Logarithmic time online multiclass prediction. In NIPS 29, 2015.
  Moustapha Cissé, Nicolas Usunier, Thierry Artières, and Patrick Gallinari. Robust bloom filters for large multilabel classification tasks. In NIPS, pages 1851-1859, 2013.
  K. Dembczyński, W. Cheng, and E. Hüllermeier. Bayes optimal multilabel classification via probabilistic classifier chains. In ICML, pages 279-286, 2010.
  K. Dembczyński, W. Cheng, and E. Hüllermeier. Bayes optimal multilabel classification via probabilistic classifier chains. In ICML, pages 279-286. Omnipress, 2010.
  K. Dembczyński, W. Waegeman, W. Cheng, and E. Hüllermeier. An analysis of chaining in multi-label classification. In ECAI, 2012.
  K. Dembczyński, W. Waegeman, W. Cheng, and E. Hüllermeier. On loss minimization and label dependence in multi-label classification. Machine Learning, 88:5-45, 2012.
  Krzysztof Dembczyński, Wojciech Kotlowski, Willem Waegeman, Róbert Busa-Fekete, and Eyke Hüllermeier. Consistency of probabilistic classifier trees. In ECMLPKDD. Springer, 2016.
  J. Deng, S. Satheesh, A. C. Berg, and Fei Fei F. Li. Fast and balanced: Efficient label tree learning for large scale object recognition. In NIPS, pages 567-575. 2011.
  Ronald Fagin, Amnon Lotem, and Moni Naor. Optimal aggregation algorithms for middleware. In PODS '01, pages 102-113. ACM, New York, NY, USA, 2001.
  R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871-1874, 2008.
  Miroslav Fiedler. Algebraic connectivity of graphs. Czechoslovak mathematical journal, 23(2):298-305, 1973.
  J. Fox. Applied regression analysis, linear models, and related methods. Sage, 1997.
  E. Frank and S. Kramer. Ensembles of nested dichotomies for multi-class problems. In ICML, 2004.
  J. H. Friedman, J. L. Bentley, and R. A. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3(3):209-226, 1977.
  D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-label prediction via compressed sensing. In NIPS, 2009.
  Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In ACM Symposium on Theory of Computing, STOC '98, pages 604-613, New York, NY, USA, 1998. ACM.
  K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx, and E. Hüllermeier. Extreme F-measure maximization using sparse probability estimates. In ICML, 2016.
  Kalina Jasinska and Nikos Karampatziakis. Log-time and log-space extreme classication. In Extreme Classification workshop at NIPS, 2016.
  Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with gpus. CoRR, abs/1702.08734, 2017.
  Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. Bag of tricks for efficient text classification. CoRR, abs/1607.01759, 2016.
  O. Koyejo, N. Natarajan, P. Ravikumar, and I. Dhillon. Consistent multilabel classification. In NIPS, 2015.
  A. Kumar, S. Vembu, A.K. Menon, and C. Elkan. Beam search algorithms for multilabel learning. In Machine Learning, 2013.
  J. Langford, A. Strehl, and L. Li. Vowpal wabbit, 2007.
  Chun-Liang Li and Hsuan-Tien Lin. Condensed filter tree for cost-sensitive multi-label classification. In ICML, pages 423-431, 2014.
  Frederic Morin and Yoshua Bengio. Hierarchical probabilistic neural network language model. In AISTATS, pages 246-252, 2005.
  Jinseok Nam, Eneldo Loza Mencía, and Johannes Fürnkranz. All-in text: Learning document, label, and word representations jointly. In AAAI Conference on Artificial Intelligence, pages 1948-1954, 2016.
  Y. Prabhu, A. Kag, S. Harsola, R. Agrawal, and M. Varma. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In WWW. ACM, 2018.
  Yashoteja Prabhu and Manik Varma. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In KDD, pages 263-272. ACM, 2014.
  A. Shrivastava and P. Li. Improved asymmetric locality sensitive hashing (ALSH) for maximum inner product search (mips). In UAI, 2015.
  F. Tai and H.-T. Lin. Multi-label classification with principal label space transformation. In Neural Computat., volume 9, pages 2508-2542, 2012.
  Farbound Tai and Hsuan-Tien Lin. Multilabel classification with principal label space transformation. Neural Computation, 24(9):2508-2542, 2012.
  G. Tsoumakas, I. Katakis, and I. Vlahavas. Effective and efficient multilabel classification in domains with large number of labels. In Proc. ECML/PKDD 2008 Workshop on Mining Multidimensional Data, 2008.
  C. J. van Rijsbergen. Foundation of evaluation. Journal of Documentation, 30(4):365-373, 1974.
  Sudheendra Vijayanarasimhan, Jonathon Shlens, Rajat Monga, and Jay Yagnik. Deep networks with large output spaces. CoRR, abs/1412.7479, 2014.
  W. Waegeman, K. Dembczynski, W. Cheng A. Jachnik, and E. Hüllermeier. On the Bayes-optimality of F-measure maximizers. Minor revision, 2014.
  K.Q. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In ICML, pages 1113-1120. ACM, 2009.
  J. Weston, A. Makadia, and H. Yee. Label partitioning for sublinear ranking. In ICML, 2013.
  Jason Weston, Samy Bengio, and Nicolas Usunier. Wsabie: Scaling up to large vocabulary image annotation. In IJCAI, pages 2764-2770, 2011.
  J. Yagnik, D. Strelow, D. A. Ross, and R. s. Lin. The power of comparative reasoning. In International Conference on Computer Vision, pages 2431-2438, Nov 2011.
  Ian E.H. Yen, Xiangru Huang, Kai Zhong, Pradeep Ravikumar, and Inderjit S. Dhillon. PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification. In ICML, 2016.
  Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, and Inderjit S. Dhillon. Large-scale Multi-label Learning with Missing Labels. In ICML, 2014.