Crowdsourced Top-k Algorithms: An Experimental Evaluation

Xiaohang Zhang

Guoliang Li

Jianhua Feng

Crowdsourced top-k computation has attracted significant attention recently, thanks to emerging crowdsourcing platforms, e.g., Amazon Mechanical Turk and CrowdFlower. Crowdsourced top-k algorithms ask the crowd to compare the objects and infer the top-k objects based on the crowdsourced comparison results. The crowd may return incorrect answers, but traditional top-k algorithms cannot tolerate the errors from the crowd. To address this problem, the database and machine-learning communities have independently studied the crowdsourced top-k problem. The database community proposes the heuristic-based solutions while the machine-learning community proposes the learning-based methods (e.g., maximum likelihood estimation). However, these two types of techniques have not been compared systematically under the same experimental framework. Thus it is rather difficult for a practitioner to decide which algorithm should be adopted. Furthermore, the experimental evaluation of existing studies has several weaknesses. Some methods assume the crowd returns high-quality results and some algorithms are only tested on simulated experiments. To alleviate these limitations, in this paper we present a comprehensive comparison of crowdsourced top-k algorithms. Using various synthetic and real datasets, we evaluate each algorithm in terms of result quality and efficiency on real crowdsourcing platforms. We reveal the characteristics of different techniques and provide guidelines on selecting appropriate algorithms for various scenarios.
Here are some notations:
  • The input datasets were aggregated ratings or pairwise comparisons from the CrowdFlower Platform.
  • All experiments were conducted on a Macbook Pro with 2.3GHz Intel i7 CPU and 16GB RAM.
  • All experiments were conducted in Python 2.7.6 environment.

Inference Methods

Method SourceCode Input Model Output Format
BordaCount
bordacope.py graph model score
Copeland bordacope.py graph model score
Local local.py graph model score
Indegree indegree.py graph model score
BRE bre.py pairwise comparison score
URE ure.py pairwise comparison score
SSCO ssco.py pairwise comparison score
SSSE ssse.py
pairwise comparison score
Iterative iterative.py graph model subset
MPageRank mpagerank.py graph model score
RankCentrality rankcentrality.py graph model score
ELO elo.py pairwise comparison score
TrueSkill trueskillranking.py pairwise comparison score
CrowdBT crowdbt.py pairwise comparison score
CrowdGauss crowdgauss.py
pairwise comparison score
HodgeRank hodgerank.py pairwise comparison score
SSRW ssrw.py graph model score
TwoStageHeap twostage.py pairwise comparison subset
PathRank pathrank.py graph model subset
AdaptiveReduce adaptivereduce.py pairwise comparison subset
Combine combine.py rating + pairwise comparison score
Hybrid hybrid.py rating + pairwise comparison score
HybridMPR hybridmpr.py rating + graph model score
[1]. For the machine-learning methods, we untilized the scipy.optimize (http://www.scipy.org/) packages.
[2]. For the ELO method, we utilized the package from https://pypi.python.org/pypi/elo
[3]. For the TrueSkill method, we utilized the package from https://pypi.python.org/pypi/trueskill/0.4.3

Selection Methods

Method SourceCode Input Output
CrowdGauss apolling.py pairwise comparisons pairwise comparisons
CrowdBT crowdbt_selection.py pairwise comparisons pairwise comparisons
Combine combinerating.py ratings + pairwise comparisons ratings + pairwise comparisons
SSCO ssco_selection.py pairwise comparisons pairwise comparisons
SSSE ssse_selection.py pairwise comparisons pairwise comparisons
SSRW ssrw_selection.py pairwise comparisons pairwise comparisons
Random random_selection.py pairwise comparisons pairwise comparisons
Group heuristic_selection.py pairwise comparisons pairwise comparisons
Max heuristic_selection.py pairwise comparisons pairwise comparisons
Greedy heuristic_selection.py pairwise comparisons pairwise comparisons
Complete heuristic_selection.py pairwise comparisons pairwise comparisons

Contact

For any question about this study, please feel free to contact Xiaohang Zhang.