Crowdsourced Top-k Algorithms: An Experimental Evaluation

Xiaohang Zhang

Guoliang Li

Jianhua Feng

Crowdsourced top-k computation has attracted significant attention recently, thanks to emerging crowdsourcing platforms, e.g., Amazon Mechanical Turk and CrowdFlower. Crowdsourced top-k algorithms ask the crowd to compare the objects and infer the top-k objects based on the crowdsourced comparison results. The crowd may return incorrect answers, but traditional top-k algorithms cannot tolerate the errors from the crowd. To address this problem, the database and machine-learning communities have independently studied the crowdsourced top-k problem. The database community proposes the heuristic-based solutions while the machine-learning community proposes the learning-based methods (e.g., maximum likelihood estimation). However, these two types of techniques have not been compared systematically under the same experimental framework. Thus it is rather difficult for a practitioner to decide which algorithm should be adopted. Furthermore, the experimental evaluation of existing studies has several weaknesses. Some methods assume the crowd returns high-quality results and some algorithms are only tested on simulated experiments. To alleviate these limitations, in this paper we present a comprehensive comparison of crowdsourced top-k algorithms. Using various synthetic and real datasets, we evaluate each algorithm in terms of result quality and efficiency on real crowdsourcing platforms. We reveal the characteristics of different techniques and provide guidelines on selecting appropriate algorithms for various scenarios.
Here are some notations:
  • The input datasets were aggregated ratings or pairwise comparisons from the CrowdFlower Platform.
  • All experiments were conducted on a Macbook Pro with 2.3GHz Intel i7 CPU and 16GB RAM.
  • All experiments were conducted in Python 2.7.6 environment.

Inference Methods

Method SourceCode Input Model Output Format
BordaCount graph model score
Copeland graph model score
Local graph model score
Indegree graph model score
BRE pairwise comparison score
URE pairwise comparison score
SSCO pairwise comparison score
pairwise comparison score
Iterative graph model subset
MPageRank graph model score
RankCentrality graph model score
ELO pairwise comparison score
TrueSkill pairwise comparison score
CrowdBT pairwise comparison score
pairwise comparison score
HodgeRank pairwise comparison score
SSRW graph model score
TwoStageHeap pairwise comparison subset
PathRank graph model subset
AdaptiveReduce pairwise comparison subset
Combine rating + pairwise comparison score
Hybrid rating + pairwise comparison score
HybridMPR rating + graph model score
[1]. For the machine-learning methods, we untilized the scipy.optimize ( packages.
[2]. For the ELO method, we utilized the package from
[3]. For the TrueSkill method, we utilized the package from

Selection Methods

Method SourceCode Input Output
CrowdGauss pairwise comparisons pairwise comparisons
CrowdBT pairwise comparisons pairwise comparisons
Combine ratings + pairwise comparisons ratings + pairwise comparisons
SSCO pairwise comparisons pairwise comparisons
SSSE pairwise comparisons pairwise comparisons
SSRW pairwise comparisons pairwise comparisons
Random pairwise comparisons pairwise comparisons
Group pairwise comparisons pairwise comparisons
Max pairwise comparisons pairwise comparisons
Greedy pairwise comparisons pairwise comparisons
Complete pairwise comparisons pairwise comparisons


For any question about this study, please feel free to contact Xiaohang Zhang.