LETOR Dataset
Istella is glad to release the Istella Learning to Rank (LETOR) dataset to the public. This dataset has been used in the past to learn one of the stages of the Istella production ranking pipeline. To the best of our knowledge, this is the largest publicly available LETOR dataset, particularly useful for large-scale experiments on the efficiency and scalability of LETOR solutions.
To use the dataset, you must read and accept the Istella LETOR Licence Agreement. By using the dataset, you agree to be bound by the terms of the license: Istella dataset is solely for non-commercial use.
The Istella LETOR full dataset is composed of 33,018 queries and 220 features representing each query-document pair. It consists of 10,454,629 examples labeled with relevance judgments ranging from 0 (irrelevant) to 4 (perfectly relevant). The average number of per-query examples is 316. It has been splitted in train and test sets according to a 80%-20% scheme.
If you want to use the dataset in your research, you can download Istella LETOR here. In case you use it, we kindly ask you to acknowledge Istella SpA and cite the following publication in your research: Domenico Dato, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees. ACM Trans. Inf. Syst. 35, 2, Article 15 (December 2016), 31 pages. DOI: https://doi.org/10.1145/2987380