Crowdsourcing Project

We are Database Group at Tsinghua University. We are delightedly to share our research results about crowdsourcing. You can find our research papers, talks, tutorials, books, source codes, systems, and other useful resources on this page. Now, enjoy yourself!
If you have any questions or concerns, please contact Prof. Guoliang Li.


Tongyu Liu, Jingru Yang, Ju Fan, Zhewei Wei, Guoliang Li, Guoliang Li, Xiaoyong Du. CrowdGame: A Game-Based Crowdsourcing System for Cost-Effective Data Labeling.. SIGMOD 2019:1957-1960. Pdf Link
Jingru Yang, Ju Fan, Zhewei Wei, Guoliang Li, Tongyu Liu, Xiaoyong Du. Cost-Effective Data Annotation using Game-Based Crowdsourcing. VLDB 2019:57-70. Pdf
Guoliang Li, Chengliang Chai, Ju Fan, Xueping Weng, et al. CDB: A Crowd-Powered Database System. VLDB 2018:1926-1929. Pdf Link
Chengliang Chai, Guoliang Li,Jian Li, Dong Deng,Jianhua Feng. A Partial-Order-Based Framework for Cost-Effective Crowdsourced Entity Resolution. VLDB Journal 27(6): 745-770 (2018). Pdf Link
Kaiyu Li, Guoliang Li, Xiaohang Zhang, Jianhua Feng. A Rating-Ranking Based Framework for Crowdsourced Top-k Computation. SIGMOD 2018: 975-990. Pdf
Chengliang Chai, Ju Fan Guoliang Li. Incentive-Based Entity Collection using Crowdsourcing. ICDE 2018: 341-352. Pdf
Xiang Yu, Guoliang Li, Yudian Zheng. CrowdOTA: An Online Task Assignment System in Crowdsourcing. ICDE 2018 Demo:629-1632. Pdf
Caihua Shan, Nikos Mamoulis, Guoliang Li, Reynold Cheng, Zhipeng Huang, Yudian Zheng. T-Crowd: Effective Crowdsourcing for Tabular Data. ICDE 2018 Poster: 1316-1319. Pdf
Yan Zhuang, Guoliang Li, Zhuojian Zhong, Jianhua Feng. Hike: A Hybrid Human-Machine Method for Entity Alignment in Large-Scale Knowledge Bases. CIKM 2017: 1917-1926. (Best Full Paper Award). Pdf
Dong Yuan, Guoliang Li, Qi Li, Yudian Zheng. Sybil Defense in Crowdsourcing Platforms. CIKM 2017: 1529-1538. Pdf
Xueping Weng, Guoliang Li, Huiqi Hu, Jianhua Feng. Crowdsourced Selection on Multi-Attribute Data. CIKM 2017: 307-316. Pdf
Guoliang Li, Chengliang Chai, Ju Fan, Jian Li, Yudian Zheng, etc. CDB: Optimizing Queries with Crowd-Based Selections and Joins. SIGMOD Conference 2017: 1463-1478. Pdf Link Slide
Guoliang Li: Human-in-the-loop Data Integration. PVLDB 10(12): 2006-2017 (2017). Pdf Link Slide
Yudian Zheng, Guoliang Li, Reynold Cheng. DOCS. A Domain-Aware Crowdsourcing System Using Knowledge Bases. PVLDB 10(4): 361-372 (2016). Pdf Link Slide
Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, Reynold Cheng. Truth Inference in Crowdsourcing: Is the Problem Solved?. PVLDB 10(5): 541-552 (2017). Pdf Link Slide
Chengliang Chai, Guoliang Li, Jian Li, Dong Deng, Jianhua Feng. Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach. SIGMOD Conference 2016: 969-984. Pdf Link
Xiaohang Zhang, Guoliang Li, Jianhua Feng. Crowdsourced Top-k Algorithms: An Exprimental Evaluation. PVLDB 9(8): 612-623 (2016). Pdf Link Dataset Code
Huiqi Hu, Guoliang Li, Zhifeng Bao and Jianhua Feng. Crowdsourcing-Based Real-Time Urban Traffic Speed Estimation: From Trends to Speeds. ICDE 2016: 883-894. Pdf Link
Huiqi Hu, Yudian Zheng, Zhifeng Bao, Guoliang Li, Jianhua Feng, and Reynold Chen. Crowdsourced POI Labelling: Location-Aware Result Inference and Task Assignment. ICDE 2016: 61-72. Pdf Link Slide
Yudian Zheng, Jiannan Wang, Guoliang Li, Reynold Cheng, Jianhua Feng. QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications. SIGMOD Conference 2015: 1031-1046. Pdf Link Slide
Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-lee Tan, Jianhua Feng. iCrowd: An Adaptive Crowdsourcing Framework. SIGMOD Conference 2015: 1015-1030. Pdf Link Slide
Henan Wang, Guoliang Li, Jianhua Feng. Incremental Quality Inference in Crowdsourcing. DASFAA (2) 2014: 453-467. (Best Paper Runnerup) Link
Jiannan Wang, Guoliang Li, Tim Kraska, Michael J. Franklin, Jianhua Feng. Leveraging Transitive Relations for Crowdsourced Joins. SIGMOD 2013:229-240. Pdf Link
1 Guoliang Li, Jiannan Wang, Yudian Zheng, Michael Franklin. Crowdsourced Data Management: A Survey. IEEE Transactions on Knowledge and Data Engineering (TKDE) 28(9): 2296-2319 (2016). Pdf Link Dataset Long
2 Chengliang Chai, Dong Deng, Guoliang Li, Jiannan Wang, Yudian Zheng. Crowdsourcing Database Systems: Overview and Challenges. ICDE 2019 Tutorial. Pdf
3 Chengliang Chai, Ju Fan, Guoliang Li, Jiannan Wang, Yudian Zheng Crowd-Powered Data Mining. KDD 2018, Tutorial. Pdf Slides Website
4 Guoliang Li, Yudian Zheng, Ju Fan, Jiannan Wang, Reynold Cheng. Crowdsourced Data Management: Overview and Challenges. SIGMOD Conference 2017: 1711-1716. Slide
5 Guoliang Li, Jiannan Wang, Yudian Zheng and Michael Franklin. Crowdsourced Data Management: A Survey (Extended Abstract). ICDE 2017: 39-40. Pdf
1 Guoliang Li, Jiannan Wang, Yudian Zheng, Ju Fan, Michael Franklin. Crowdsourced Data Management © 2018. Springer. Link



dataset preview

Datasets in Crowdsourcing

Nov 2, 2019 (last update)

This is a collection of dataset in crowdsourcing with two parts:

  • Part 1: Datasets with ground truth and workers' answers
  • Part 2: Datasets with only ground truth ( no workers' answers )



ChinaCrowds: a Crowdsourcing Database Platform

ChinaCrowds aims to address machine-hard queries, e.g. labeling and translation. Requester can crowdsource their requirements, e.g., needed services, ideas, or content, and get answers from a large group of people, and especially from an online community, rather than from traditional employees or suppliers. Worker can make money by answering requesters's questions.


CDB: a Crowd-powered Database System

CDB is a crowd-powered database system that supports crowd-based query optimizations with focus on join and selection. CDB has fundamental differences from existing systems. First, CDB employs a graph-based query model that provides more fine-grained query optimization. Second, CDB adopts a unified framework to perform the multi-goal optimization based on the graph model. We have implemented our system and deployed it on Amazon Mechanical Turk, CrowdFlower and ChinaCrowd.


CrowdOTA: an Online Task Assignment System in Crowdsourcing

We develop an online task assignment system, CrowdOTA. When a worker requests tasks, CrowdOTA on-the-fly selects k tasks to the worker. CrowdOTA implements multiple online task assignment algorithms and requesters can select any algorithm to assign their tasks.