
R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma.
Multilabel learning with millions of labels: Recommending advertiser
bid phrases for web pages.
In WWW, May 2013.


Alex Auvolat, Hugo Larochelle, Sarath Chandar, Pascal Vincent, and Yoshua
Bengio.
Clustering is efficient for approximate maximum inner product search.
CoRR, abs/1507.05910, 2015.


Rohit Babbar and Bernhard Schölkopf.
Dismec  distributed sparse machines for extreme multilabel
classification.
Web Search and Data Mining, 2017.


S. Bengio, J. Weston, and D. Grangier.
Label embedding trees for large multiclass tasks.
In NIPS, pages 163171. Curran Associates, Inc., 2010.


A. Beygelzimer, J. Langford, Y. Lifshits, G. B. Sorkin, and A. L. Strehl.
Conditional probability tree estimation analysis and algorithms.
In UAI, pages 5158, 2009.


A. Beygelzimer, J. Langford, and P. D. Ravikumar.
Errorcorrecting tournaments.
In ALT, pages 247262, 2009.


Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain.
Sparse local embeddings for extreme multilabel classification.
In NIPS, 2015.


L. Bottou.
Largescale machine learning with stochastic gradient descent.
In Yves Lechevallier and Gilbert Saporta, editors, COMPSTAT,
pages 177187, Paris, France, August 2010. Springer.


YaoNan Chen and HsuanTien Lin.
Featureaware label space dimension reduction for multilabel
classification.
In NIPS, pages 15291537. Curran Associates, Inc., 2012.


Anna Choromanska and John Langford.
Logarithmic time online multiclass prediction.
In NIPS 29, 2015.


Moustapha Cissé, Nicolas Usunier, Thierry Artières, and Patrick
Gallinari.
Robust bloom filters for large multilabel classification tasks.
In NIPS, pages 18511859, 2013.


K. Dembczyński, W. Cheng, and E. Hüllermeier.
Bayes optimal multilabel classification via probabilistic classifier
chains.
In ICML, pages 279286, 2010.


K. Dembczyński, W. Cheng, and E. Hüllermeier.
Bayes optimal multilabel classification via probabilistic classifier
chains.
In ICML, pages 279286. Omnipress, 2010.


K. Dembczyński, W. Waegeman, W. Cheng, and E. Hüllermeier.
An analysis of chaining in multilabel classification.
In ECAI, 2012.


K. Dembczyński, W. Waegeman, W. Cheng, and E. Hüllermeier.
On loss minimization and label dependence in multilabel
classification.
Machine Learning, 88:545, 2012.


Krzysztof Dembczyński, Wojciech Kotlowski, Willem Waegeman, Róbert
BusaFekete, and Eyke Hüllermeier.
Consistency of probabilistic classifier trees.
In ECMLPKDD. Springer, 2016.


J. Deng, S. Satheesh, A. C. Berg, and Fei Fei F. Li.
Fast and balanced: Efficient label tree learning for large scale
object recognition.
In NIPS, pages 567575. 2011.


Ronald Fagin, Amnon Lotem, and Moni Naor.
Optimal aggregation algorithms for middleware.
In PODS '01, pages 102113. ACM, New York, NY, USA, 2001.


R.E. Fan, K.W. Chang, C.J. Hsieh, X.R. Wang, and C.J. Lin.
LIBLINEAR: A library for large linear classification.
Journal of Machine Learning Research, 9:18711874, 2008.


Miroslav Fiedler.
Algebraic connectivity of graphs.
Czechoslovak mathematical journal, 23(2):298305, 1973.


J. Fox.
Applied regression analysis, linear models, and related
methods.
Sage, 1997.


E. Frank and S. Kramer.
Ensembles of nested dichotomies for multiclass problems.
In ICML, 2004.


J. H. Friedman, J. L. Bentley, and R. A. Finkel.
An algorithm for finding best matches in logarithmic expected time.
ACM Transactions on Mathematical Software, 3(3):209226, 1977.


D. Hsu, S. Kakade, J. Langford, and T. Zhang.
Multilabel prediction via compressed sensing.
In NIPS, 2009.


Piotr Indyk and Rajeev Motwani.
Approximate nearest neighbors: Towards removing the curse of
dimensionality.
In ACM Symposium on Theory of Computing, STOC '98, pages
604613, New York, NY, USA, 1998. ACM.


K. Jasinska, K. Dembczynski, R. BusaFekete, K. Pfannschmidt, T. Klerx, and
E. Hüllermeier.
Extreme Fmeasure maximization using sparse probability estimates.
In ICML, 2016.


Kalina Jasinska and Nikos Karampatziakis.
Logtime and logspace extreme classication.
In Extreme Classification workshop at NIPS, 2016.


Jeff Johnson, Matthijs Douze, and Hervé Jégou.
Billionscale similarity search with gpus.
CoRR, abs/1702.08734, 2017.


Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov.
Bag of tricks for efficient text classification.
CoRR, abs/1607.01759, 2016.


O. Koyejo, N. Natarajan, P. Ravikumar, and I. Dhillon.
Consistent multilabel classification.
In NIPS, 2015.


A. Kumar, S. Vembu, A.K. Menon, and C. Elkan.
Beam search algorithms for multilabel learning.
In Machine Learning, 2013.


J. Langford, A. Strehl, and L. Li.
Vowpal wabbit, 2007.


ChunLiang Li and HsuanTien Lin.
Condensed filter tree for costsensitive multilabel classification.
In ICML, pages 423431, 2014.


Frederic Morin and Yoshua Bengio.
Hierarchical probabilistic neural network language model.
In AISTATS, pages 246252, 2005.


Jinseok Nam, Eneldo Loza Mencía, and Johannes Fürnkranz.
Allin text: Learning document, label, and word representations
jointly.
In AAAI Conference on Artificial Intelligence, pages
19481954, 2016.


Y. Prabhu, A. Kag, S. Harsola, R. Agrawal, and M. Varma.
Parabel: Partitioned label trees for extreme classification with
application to dynamic search advertising.
In WWW. ACM, 2018.


Yashoteja Prabhu and Manik Varma.
Fastxml: A fast, accurate and stable treeclassifier for extreme
multilabel learning.
In KDD, pages 263272. ACM, 2014.


A. Shrivastava and P. Li.
Improved asymmetric locality sensitive hashing (ALSH) for maximum
inner product search (mips).
In UAI, 2015.


F. Tai and H.T. Lin.
Multilabel classification with principal label space transformation.
In Neural Computat., volume 9, pages 25082542, 2012.


Farbound Tai and HsuanTien Lin.
Multilabel classification with principal label space transformation.
Neural Computation, 24(9):25082542, 2012.


G. Tsoumakas, I. Katakis, and I. Vlahavas.
Effective and efficient multilabel classification in domains with
large number of labels.
In Proc. ECML/PKDD 2008 Workshop on Mining Multidimensional
Data, 2008.


C. J. van Rijsbergen.
Foundation of evaluation.
Journal of Documentation, 30(4):365373, 1974.


Sudheendra Vijayanarasimhan, Jonathon Shlens, Rajat Monga, and Jay Yagnik.
Deep networks with large output spaces.
CoRR, abs/1412.7479, 2014.


W. Waegeman, K. Dembczynski, W. Cheng A. Jachnik, and E. Hüllermeier.
On the Bayesoptimality of Fmeasure maximizers.
Minor revision, 2014.


K.Q. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg.
Feature hashing for large scale multitask learning.
In ICML, pages 11131120. ACM, 2009.


J. Weston, A. Makadia, and H. Yee.
Label partitioning for sublinear ranking.
In ICML, 2013.


Jason Weston, Samy Bengio, and Nicolas Usunier.
Wsabie: Scaling up to large vocabulary image annotation.
In IJCAI, pages 27642770, 2011.


J. Yagnik, D. Strelow, D. A. Ross, and R. s. Lin.
The power of comparative reasoning.
In International Conference on Computer Vision, pages
24312438, Nov 2011.


Ian E.H. Yen, Xiangru Huang, Kai Zhong, Pradeep Ravikumar, and Inderjit S.
Dhillon.
PDSparse: A Primal and Dual Sparse Approach to Extreme Multiclass
and Multilabel Classification.
In ICML, 2016.


HsiangFu Yu, Prateek Jain, Purushottam Kar, and Inderjit S. Dhillon.
Largescale Multilabel Learning with Missing Labels.
In ICML, 2014.
