2026
-
Zirui Tang, Boyu Niu, Xuanhe Zhou, Boxiu Li, Wei Zhou, Jiannan Wang, Guoliang Li, Xinyi Zhang, Fan Wu.
ST-Raptor: LLM-Powered Semi-Structured Table Question Answering.
SIGMOD 2026 , Bengaluru, India
2025
-
Xiaoying Wang, Jiannan Wang, Tianzheng Wang, Yong Zhang.
Accio: Bolt-on Query Federation.
VLDB 2025 , London, United Kingdom
-
Chunyu Chen, Zhengjie Miao, Yong Zhang, Jiannan Wang.
ParSEval: Plan-aware Test Database Generation for SQL Equivalence Evaluation.
VLDB 2025 , London, United Kingdom
-
Longxu Sun, Xin Huang, Jiannan Wang, Jianliang Xu.
A Flexible Framework for Query-oriented Interactive Community Search.
VLDB 2025 , London, United Kingdom
-
Danrui Qi, Zhengjie Miao, Jiannan Wang.
CleanAgent: Automating Data Standardization with LLM-based Agents.
DataAI Workshop @ VLDB 2025 , London, United Kingdom
-
Guoliang Li, Jiayi Wang, Chenyang Zhang, Jiannan Wang.
Data+AI: LLM4Data and Data4LLM.
SIGMOD 2025 (Tutorial) , Berlin, Germany
-
Chenhao Xu, Chunyu Chen, Jinglin Peng, Jiannan Wang, Jun Gao.
BQSched: A Non-Intrusive Scheduler for Batch Concurrent Queries via Reinforcement Learning.
ICDE 2025 , Hong Kong
-
Shi Heng Zhang, Zhengjie Miao, Jiannan Wang.
LineageX: A Column Lineage Extraction System for SQL.
ICDE 2025 (demo) , Hong Kong
2024
2023
2022
-
Xiaoying Wang*, Weiyuan Wu*, Jinze Wu, Yizhou Chen, Nick Zrymiak, Changbo Qu, Lampros Flokas, George Chow, Jiannan Wang, Tianzheng Wang, Eugene Wu, Qingqing Zhou.
ConnectorX: Accelerating Data Loading From Databases to Dataframes.
VLDB 2022 , Sydney, Australia
(* Equally Contributed).
-
Jinglin Peng, Bolin Ding, Jiannan Wang, Kai Zeng, Jingren Zhou.
One Size Does Not Fit All: A Bandit-Based Sampler Combination Framework with Theoretical Guarantees.
SIGMOD 2022 , Philadelphia, PA, USA
-
Lampros Flokas, Weiyuan Wu, Yejia Liu, Jiannan Wang, Nakul Verma, Eugene Wu.
Complaint-Driven Training Data Debugging at Interactive Speeds.
SIGMOD 2022 , Philadelphia, PA, USA
-
Yejia Liu*, Weiyuan Wu*, Lampros Flokas, Jiannan Wang, Eugene Wu.
Enabling SQL-based Training Data Debugging for Federated Learning.
VLDB 2022 , Sydney, Australia (code)
(* Equally Contributed).
-
Jinglin Peng, Weiyuan Wu, Jing Nathan Yan, Danrui Qi, Jeffrey M. Rzeszotarski, Jiannan Wang.
User Interfaces for Exploratory Data Analysis: A Survey of Open-Source and Commercial Tools.
IEEE Data Eng. Bull. 2022
2021
-
Brandon Lockhart, Jinglin Peng, Weiyuan Wu, Jiannan Wang, Eugene Wu.
Explaining Inference Queries with Bayesian Optimization.
VLDB 2021 , Copenhagen, Denmark (code)
-
Xiaoying Wang*, Changbo Qu*, Weiyuan Wu*, Jiannan Wang, Qingqing Zhou.
Are We Ready For Learned Cardinality Estimation? (Best EA&B Paper Award)
VLDB 2021 , Copenhagen, Denmark (slides, code, news)
(* Equally Contributed).
-
Jinglin Peng*, Weiyuan Wu*, Brandon Lockhart, Song Bian, Jing Nathan Yan, Linghao Xu, Zhixuan Chi, Jeffrey M. Rzeszotarski, Jiannan Wang.
DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python.
SIGMOD 2021, Xi'an, Shaanxi, China. [code, reddit, blog post 1, 2]
(* Equally Contributed).
2020
-
Liang Zhao, Qingcan Li, Pei Wang, Jiannan Wang, Eugene Wu.
ActiveDeeper: A Model-based Active Data Enrichment System.
VLDB 2020, Tokyo, Japan (Demo) [video].
-
Weiyuan Wu, Lampros Flokas, Eugene Wu, Jiannan Wang.
Complaint-driven Training Data Debugging for Query 2.0.
SIGMOD 2020, Amsterdam, The Netherlands.
-
Jing Nathan Yan, Oliver Schulte, MoHan Zhang, Jiannan Wang, Reynold Cheng.
SCODED: Statistical Constraint Oriented Data Error Detection.
SIGMOD 2020, Amsterdam, The Netherlands.
-
Weiyuan Wu, Lampros Flokas, Eugene Wu, Jiannan Wang.
Towards Complaint-driven ML Workflow Debugging.
MLOps 2020, Austin, TX, USA.
-
Ruochen Jiang*, Changbo Qu*, Jiannan Wang, Chi Wang, Yudian Zheng.
Towards Extracting Highlights From Recorded Live Videos: An Implicit Crowdsourcing Approach.
ICDE 2020, Dallas, TX, USA. (Short Paper) [Project Page]
(* Equally Contributed)
2019
-
Ruochen Jiang*, Changbo Qu*, Jiannan Wang, Chi Wang, Yudian Zheng.
Towards Extracting Highlights From Recorded Live Videos: An Implicit Crowdsourcing Approach.
Technical Report 2019. [Project Page]
(* Equally Contributed)
-
Jing Nathan Yan, Oliver Schulte, Jiannan Wang, Reynold Cheng.
CODED: Column-Oriented Data Error Detection with Statistical Constraints.
Technical Report 2019.
-
Pei Wang, Ryan Shea, Jiannan Wang, Eugene Wu.
Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment.
SIGMOD 2019, Amsterdam, The Netherlands.
-
Mohamad Dolatshah*, Mathew Teoh*, Jiannan Wang, Jian Pei.
Cleaning Crowdsourced Labels Using Oracles For Statistical Classification.
VLDB 2019, Los Angeles, California [Technical Report]
(* Equally Contributed)
2018
-
Jinglin Peng, Dongxiang Zhang, Jiannan Wang, Jian Pei.
AQP++: Connecting Approximate Query Processing With Aggregate Precomputation for Interactive Analytics.
SIGMOD 2018, Houston, TX, USA.
-
Pei Wang, Yongjun He, Ryan Shea, Jiannan Wang, Eugene Wu.
Deeper: A Data Enrichment System Powered by Deep Web.
SIGMOD 2018, Houston, TX, USA. (Demo) (system, video).
-
Chengliang Chai, Ju Fan, Guoliang Li, Jiannan Wang, Yudian Zheng.
Crowd-Powered Data Mining.
KDD 2018, London, UK (Tutorial).
-
Guoliang Li, Jiannan Wang, Yudian Zheng, Ju Fan, Michael J Franklin.
Crowdsourced Data Management: Hybrid Human-Machine Data Management.
Springer 2018 (Book).
2017
-
Liwen Sun, Michael J. Franklin, Jiannan Wang, Eugene Wu.
Skipping-oriented Partitioning for Columnar Layouts.
VLDB 2017, Munich, Germany.
-
Guoliang Li, Yudian Zheng, Ju Fan, Jiannan Wang, Reynold Cheng.
Crowdsourced Data Management: Overview and Challenges.
SIGMOD 2017, Chicago, IL, USA. (Tutorial) [slides]
-
Jiannan Wang, Nan Tang.
Dependable Data Repairing with Fixing Rules.
ACM Journal of Data and
Information Quality (JDIQ) 8(4) (2017)
-
Chuancong Gao, Jiannan Wang, Jian Pei, Rui Li, Yi Chang.
Preference-driven Similarity Join.
IEEE/WIC/ACM International Conference on Web Intelligence (WI 2017), Leipzig, Germany [Best Student Paper Award]
2016
-
Ruochen Jiang, Jiannan Wang.
Reprowd: Crowdsourced Data Processing Made Reproducible.
HCOMP 2016, Austin, TX, USA. (Demo) [Website]
-
Sanjay Krishnan, Jiannan Wang, Eugene Wu, Michael J. Franklin, Ken Goldberg.
ActiveClean: Interactive Data Cleaning For Statistical Modeling.
VLDB 2016, New Delhi, India.
-
Daniel Haas, Jiannan Wang, Eugene Wu, Michael J. Franklin.
CLAMShell: Speeding up Crowds for Low-latency Data Labeling.
VLDB 2016, New Delhi, India.
-
Lingyang Chu, Zhefeng Wang, Jian Pei, Jiannan Wang, Zijin Zhao, Enhong Chen.
Finding Gangs in War from Signed Networks.
KDD 2016, San Francisco, CA, USA.
-
Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Ken Goldberg, Tim Kraska.
PrivateClean: Data Cleaning and Differential Privacy.
SIGMOD 2016, San Francisco, CA, USA.
-
Sanjay Krishnan, Michael J. Franklin, Ken Goldberg, Jiannan Wang, Eugene Wu.
ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning.
SIGMOD 2016, San Francisco, CA, USA. (Demo) [Best Demonstration Award]
-
Xu Chu, Ihab F. Ilyas, Sanjay Krishnan, Jiannan Wang*.
Data Cleaning: Overview and Emerging Challenges.
SIGMOD 2016, San Francisco, CA, USA. (Tutorial) [slides]
(* Alphabetical Order)
-
Guoliang Li, Jiannan Wang, Yudian Zheng, Michael J. Franklin.
Crowdsourced Data Management: A Survey.
TKDE 2016.
2015
-
Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Ken Goldberg, Tim Kraska, Tova Milo, Eugene Wu.
SampleClean: Fast and Reliable Analytics on Dirty Data.
Data Engineering Bulletin 38(3):59-75 (2015).
-
Daniel Haas, Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Eugene Wu.
Wisteria: Nurturing Scalable Data Cleaning Infrastructure.
VLDB 2015, Kohala Coast, Hawai'i (Demo).
-
Sanjay Krishnan, Jiannan Wang, Michael Franklin, Ken Goldberg, Tim Kraska.
Stale View Cleaning: Getting Fresh Answers from Stale Materialized Views.
VLDB 2015, Kohala Coast, Hawai'i.
-
Yudian Zheng, Jiannan Wang, Guoliang Li, Reynold Cheng, Jianhua Feng.
QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications.
SIGMOD 2015 , Melbourne, VIC, Australia. [More]
2014
-
Jiannan Wang, Sanjay Krishnan, Michael Franklin, Ken Goldberg, Tim Kraska, Tova Milo.
A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data.
SIGMOD 2014, Snowbird, Utah, USA. [Project Website]
-
Jiannan Wang, Nan Tang.
Towards Dependable Data Repairing with Fixing Rules.
SIGMOD 2014, , Snowbird, Utah, USA.
-
Dong Deng, Guoliang Li, Shuang Hao, Jiannan Wang, Jianhua Feng.
MassJoin: A MapReduce-based Algorithm for String Similarity Joins.
ICDE 2014, Chicago, IL, USA.
-
Jiannan Wang, Guoliang Li, Jianhua Feng.
Extending String Similarity Join to Tolerant Fuzzy Token Matching.
ACM Trans. Database Syst. (TODS) 39(1):7 (2014).
-
Jeffrey Mahler, Sanjay Krishnan, Michael Laskey, Siddarth Sen, Adithyavairavan Murali, Ben Kehoe, Sachin Patil, Jiannan Wang, Mike Franklin, Pieter Abbeel, Ken Goldberg.
Learning Accurate Kinematic Control of Cable-Driven Surgical Robots Using Data Cleaning and Gaussian Process Regression.
IEEE CASE 2014 , Taipei, Taiwan.[Project Website]
2013
-
Yu Jiang, Dong Deng, Jiannan Wang, Guoliang Li, Jianhua Feng.
Efficient parallel partition-based algorithms for similarity search and join with edit distance constraints.
EDBT/ICDT Workshops, Genoa italy. [Competition Results] [News]
-
Jiannan Wang, Guoliang Li, Tim Kraska, Michael Franklin, Jianhua Feng.
Leveraging Transitive Relations for Crowdsourced Joins.
SIGMOD 2013, New York, New York, USA. [Note]
2012
-
Jiannan Wang, Tim Kraska, Michael Franklin, Jianhua Feng.
CrowdER: Crowdsourcing Entity Resolution.
VLDB 2012, Istanbul, Turkey.
-
Guoliang Li, Dong Deng, Jiannan Wang, Jianhua Feng.
Pass-Join: A Partition-based Method for Similarity Joins.
VLDB 2012, Istanbul, Turkey. [More]
-
Jiannan Wang, Guoliang Li, Jianhua Feng.
Can We Beat The Prefix Filtering? An Adaptive Framework for Similarity Join and Search.
SIGMOD 2012, Scottsdale, Arizona, USA. [More]
-
Jianhua Feng, Jiannan Wang, Guoliang Li.
Trie-join: a trie-based method for efficient string similarity joins.
VLDB Journal 21(4):437-461 (2012). [More]
-
Guoliang Li, Jiannan Wang, Chen Li, Jianhua Feng.
Supporting Efficient Top-k Queries in Type-Ahead Search.
SIGIR 2012, Portland, Oregon, USA.
2011
-
Jiannan Wang, Guoliang Li, Jeffrey Xu Yu, Jianhua Feng.
Entity Matching: How Similar Is Similar.
VLDB 2011, Seattle, WA, USA.
-
Jiannan Wang, Guoliang Li, Jianhua Feng.
Fast-Join: An Efficient Method for Fuzzy Token Matching based String Similarity Join.
ICDE 2011, Hannover, Germany. [More]
-
Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, and Jianhua Feng.
DBease: Making Databases User-Friendly and Easily Accessible.
CIDR 2011, Asilomar, California. [More]
2010
-
Jiannan Wang, Inci Cetindil, Shengyue Ji, Chen Li*, Xiaohui Xie*, Guoliang Li, Jianhua Feng.
Interactive and Fuzzy Search: A Dynamic Way to Explore MEDLINE.
Bioinformatics 26(18):2313-2320 (2010). [iPubMed Search]
(Note: Inci Cetindil is also the first author of this paper.)
-
Jiannan Wang, Jianhua Feng, Guoliang Li.
Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraint.
VLDB 2010, Singapore. [More]
-
Guoliang Li, Shengyue Ji, Chen Li, Jiannan Wang, Jianhua Feng.
Efficient Fuzzy Type-Ahead Search in TASTIER.
ICDE 2010, Long Beach, California, USA (Demo)
2009