README Istella LETOR file
Non-Commercial Use Only
<<  LETOR: Benchmark Datasets for Learning to Rank >>

_____________________________________________________________________


Istella is glad to release the Istella Learning to Rank (LETOR) dataset to the public. 
This  dataset has been used in the past to learn one of the stages of the Istella production 
ranking pipeline. 
To the best of our knowledge, this is the largest publicly available LETOR dataset, 
particularly useful for large-scale experiments on the efficiency and scalability of LETOR solutions.

To use the dataset, you must read and accept the Istella LETOR Licence Agreement. 
By using the dataset, you agree to be bound by the terms of the license 

Istella dataset is solely for NON-COMMERCIAL USE.

Istella LETOR is composed of 33,018 queries and 220 features representing each query-document pair.
Istella-S LETOR consists of 3,408,630 pairs produced by sampling irrelevant pairs to an average of 
103 examples per query. It has been splitted in train, validation and test sets according to a
60%-20%-20% scheme.
If you want to use the full dataset in your research, we just kindly ask you to acknowledge
Istella and cite the following publications in your research:

  [1] D.Dato, C.Lucchese, F.M. Nardini, S.Orlando, R.Perego, N.Tonellotto, R.Venturini
      "Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees"
      ACM Trans. Inf. Syst. 35, 2, Article 15 (December 2016), 31 pages.
      DOI: https://doi.org/10.1145/2987380

  [2] C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, F. Silvestri, S. Trani:
      "Post-Learning Optimization of Tree Ensembles for Efficient Ranking",
      Proceedings of the 39th International ACM Conference on Research and Development
      in Information Retrieval (SIGIR), July 2016 
      DOI: https://dx.doi.org/10.1145/2911451.2914763


May 15, 2016

Page 1 of 1
