Crowdsourcing Project
 
              
                We are Database Group at Tsinghua University. We are delightedly to share our research results
                about crowdsourcing. You can find our research papers, talks, tutorials, books, source codes, systems,
                and other useful resources on this page. Now, enjoy yourself!
                If you have any questions or concerns, please contact Prof. Guoliang Li.
              
Publications
| Tongyu Liu, Jingru Yang, Ju Fan, Zhewei Wei, Guoliang Li, Guoliang Li, Xiaoyong Du. CrowdGame: A Game-Based Crowdsourcing System for Cost-Effective Data Labeling.. SIGMOD 2019:1957-1960. Pdf Link | |
| Jingru Yang, Ju Fan, Zhewei Wei, Guoliang Li, Tongyu Liu, Xiaoyong Du. Cost-Effective Data Annotation using Game-Based Crowdsourcing. VLDB 2019:57-70. Pdf | |
| Guoliang Li, Chengliang Chai, Ju Fan, Xueping Weng, et al. CDB: A Crowd-Powered Database System. VLDB 2018:1926-1929. Pdf Link | |
| Chengliang Chai, Guoliang Li,Jian Li, Dong Deng,Jianhua Feng. A Partial-Order-Based Framework for Cost-Effective Crowdsourced Entity Resolution. VLDB Journal 27(6): 745-770 (2018). Pdf Link | |
| Kaiyu Li, Guoliang Li, Xiaohang Zhang, Jianhua Feng. A Rating-Ranking Based Framework for Crowdsourced Top-k Computation. SIGMOD 2018: 975-990. Pdf | |
| Chengliang Chai, Ju Fan Guoliang Li. Incentive-Based Entity Collection using Crowdsourcing. ICDE 2018: 341-352. Pdf | |
| Xiang Yu, Guoliang Li, Yudian Zheng. CrowdOTA: An Online Task Assignment System in Crowdsourcing. ICDE 2018 Demo:629-1632. Pdf | |
| Caihua Shan, Nikos Mamoulis, Guoliang Li, Reynold Cheng, Zhipeng Huang, Yudian Zheng. T-Crowd: Effective Crowdsourcing for Tabular Data. ICDE 2018 Poster: 1316-1319. Pdf | |
| Yan Zhuang, Guoliang Li, Zhuojian Zhong, Jianhua Feng. Hike: A Hybrid Human-Machine Method for Entity Alignment in Large-Scale Knowledge Bases. CIKM 2017: 1917-1926. (Best Full Paper Award). Pdf | |
| Dong Yuan, Guoliang Li, Qi Li, Yudian Zheng. Sybil Defense in Crowdsourcing Platforms. CIKM 2017: 1529-1538. Pdf | |
| Xueping Weng, Guoliang Li, Huiqi Hu, Jianhua Feng. Crowdsourced Selection on Multi-Attribute Data. CIKM 2017: 307-316. Pdf | |
| Guoliang Li, Chengliang Chai, Ju Fan, Jian Li, Yudian Zheng, etc. CDB: Optimizing Queries with Crowd-Based Selections and Joins. SIGMOD Conference 2017: 1463-1478. Pdf Link Slide | |
| Guoliang Li: Human-in-the-loop Data Integration. PVLDB 10(12): 2006-2017 (2017). Pdf Link Slide | |
| Yudian Zheng, Guoliang Li, Reynold Cheng. DOCS. A Domain-Aware Crowdsourcing System Using Knowledge Bases. PVLDB 10(4): 361-372 (2016). Pdf Link Slide | |
| Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, Reynold Cheng. Truth Inference in Crowdsourcing: Is the Problem Solved?. PVLDB 10(5): 541-552 (2017). Pdf Link Slide | |
| Chengliang Chai, Guoliang Li, Jian Li, Dong Deng, Jianhua Feng. Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach. SIGMOD Conference 2016: 969-984. Pdf Link | |
| Xiaohang Zhang, Guoliang Li, Jianhua Feng. Crowdsourced Top-k Algorithms: An Exprimental Evaluation. PVLDB 9(8): 612-623 (2016). Pdf Link Dataset Code | |
| Huiqi Hu, Guoliang Li, Zhifeng Bao and Jianhua Feng. Crowdsourcing-Based Real-Time Urban Traffic Speed Estimation: From Trends to Speeds. ICDE 2016: 883-894. Pdf Link | |
| Huiqi Hu, Yudian Zheng, Zhifeng Bao, Guoliang Li, Jianhua Feng, and Reynold Chen. Crowdsourced POI Labelling: Location-Aware Result Inference and Task Assignment. ICDE 2016: 61-72. Pdf Link Slide | |
| Yudian Zheng, Jiannan Wang, Guoliang Li, Reynold Cheng, Jianhua Feng. QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications. SIGMOD Conference 2015: 1031-1046. Pdf Link Slide | |
| Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-lee Tan, Jianhua Feng. iCrowd: An Adaptive Crowdsourcing Framework. SIGMOD Conference 2015: 1015-1030. Pdf Link Slide | |
| Henan Wang, Guoliang Li, Jianhua Feng. Incremental Quality Inference in Crowdsourcing. DASFAA (2) 2014: 453-467. (Best Paper Runnerup) Link | |
| Jiannan Wang, Guoliang Li, Tim Kraska, Michael J. Franklin, Jianhua Feng. Leveraging Transitive Relations for Crowdsourced Joins. SIGMOD 2013:229-240. Pdf Link | 
| 1 | Guoliang Li, Jiannan Wang, Yudian Zheng, Michael Franklin. Crowdsourced Data Management: A Survey. IEEE Transactions on Knowledge and Data Engineering (TKDE) 28(9): 2296-2319 (2016). Pdf Link Dataset Long | 
| 2 | Chengliang Chai, Dong Deng, Guoliang Li, Jiannan Wang, Yudian Zheng. Crowdsourcing Database Systems: Overview and Challenges. ICDE 2019 Tutorial. Pdf | 
| 3 | Chengliang Chai, Ju Fan, Guoliang Li, Jiannan Wang, Yudian Zheng Crowd-Powered Data Mining. KDD 2018, Tutorial. Pdf Slides Website | 
| 4 | Guoliang Li, Yudian Zheng, Ju Fan, Jiannan Wang, Reynold Cheng. Crowdsourced Data Management: Overview and Challenges. SIGMOD Conference 2017: 1711-1716. Slide | 
| 5 | Guoliang Li, Jiannan Wang, Yudian Zheng and Michael Franklin. Crowdsourced Data Management: A Survey (Extended Abstract). ICDE 2017: 39-40. Pdf | 
| 1 | Guoliang Li, Jiannan Wang, Yudian Zheng, Ju Fan, Michael Franklin. Crowdsourced Data Management © 2018. Springer. Link | 
Tutorials
- ICDE 2019 Tutorial on Crowdsourcing Database Systems: Overview and Challenges Slide
- KDD 2018 Tutorial on Crowd-Powered Data Mining. Website
- SIGMOD 2017 Tutorial on Crowdsourced Data Management. Video (Part I) Video (Part II) Slide
- VLDB 2017 Talk on Human-in-the-loop Data Integration Paper Slide
Datasets
 
              Datasets in Crowdsourcing
This is a collection of dataset in crowdsourcing with two parts:
- Part 1: Datasets with ground truth and workers' answers
- Part 2: Datasets with only ground truth ( no workers' answers )
Systems
 
              ChinaCrowds: a Crowdsourcing Database Platform
ChinaCrowds aims to address machine-hard queries, e.g. labeling and translation. Requester can crowdsource their requirements, e.g., needed services, ideas, or content, and get answers from a large group of people, and especially from an online community, rather than from traditional employees or suppliers. Worker can make money by answering requesters's questions.
 
              CDB: a Crowd-powered Database System
CDB is a crowd-powered database system that supports crowd-based query optimizations with focus on join and selection. CDB has fundamental differences from existing systems. First, CDB employs a graph-based query model that provides more fine-grained query optimization. Second, CDB adopts a unified framework to perform the multi-goal optimization based on the graph model. We have implemented our system and deployed it on Amazon Mechanical Turk, CrowdFlower and ChinaCrowd.
 
              CrowdOTA: an Online Task Assignment System in Crowdsourcing
We develop an online task assignment system, CrowdOTA. When a worker requests tasks, CrowdOTA on-the-fly selects k tasks to the worker. CrowdOTA implements multiple online task assignment algorithms and requesters can select any algorithm to assign their tasks.
