Data Mining Group of the Database Systems Laboratory at Tsinghua University

 

Group members  

§  Principal investigator

§  Jianyong Wang (Professor)

§  Current members

§  Jiacheng Xu (Ph.D. student)

§  Cong Han (Ph.D. student)

§  Zhenyu Li (Ph.D. student)

§  Yunfei Yang (Ph.D. student)

§  Bowen Dong (Ph.D. student)

§  Yutao Sun (Ph.D. Student)

§  Tingting Li (Ph.D. Student)

§  Tengyu Pan (Master Student)

§  Fangzhou Liu (Ph.D. Student)

§  Junyan Zhang (Ph.D. Student)

§  Yike Zhang (Ph.D. Student)

§  Yu Mu (Ph.D. Student)

§  Yanzhen Li (Visiting Scholar)

§  Dongbai Li (Bachelor student)

§  Yang Yu (Bachelor student)

§  Xinle Zheng (Bachelor student)

§  Former members

§  Zhichao Duan (Ph.D., 2025, Research Associate, Imperial College London)

§  Yike Zhang (B.E., 2025,清华大学优秀本科论文奖,清华大学优良毕业生称号)

§  Jiachao Xiong (B.E., 2025)

§  Fangzhou Liu (B.E., 2024,清华大学优良毕业生称号)

§  MinJia Wang (B.E., 2024,清华大学优良毕业生称号)

§  Zhuo Wang (Ph.D.,2023, 特别副研究员, 北京理工大学)

§  Yutao Sun (B.E., 2023,清华大学优秀本科论文奖)

§  Tengyu Pan (B.E., 2023,清华大学优良毕业生称号)

§  Xiuxing Li (Ph.D., 2022, 特别副研究员, 北京理工大学)

§  Rui Zhang(M.E., 2022)

§  Bowen Dong (B.E., 2022)

§  Zhongkai He (B.E., 2022)

§  Ning Liu (Ph.D., Assistant Professor, 山东大学)

§  Zhenyu Li (B.E., 2021)

§  Jianyuan Lu (Posdoc Researcher)

§  Chenwei Ran (Ph.D., 2020)

§  Zujiang Pan (M.E., 2020)

§  Jiacheng Xu (B.E., 2020)

§  Yuanquan Lu (M.E., 2019)

§  Long Guo (B.E., 2019)

§  Yifan Li (B.E., 2019)

§  Gang Chen (M.E., 2018)

§  Pan Lu (M.E., 2018,清华大学优秀硕士论文奖, Postdoc, Stanford University)

§  Xingzhi Niu (B.E., 2018)

§  Junyi Fu (B.E., 2018)

§  Jianhua Yin (Ph.D. , 2017Tenured Associate Professor, 山东大学)

§  Xinding Wei (B.E., 2017)

§  Wei Zhang (Ph.D. , 2016Professor, 北京市优秀博士毕业生、清华大学优秀博士论文奖,华东师大紫江学者)

§  Yuda Zang (M.E., 2016)

§  Chao Wang (M.E., 2016)

§  Wei Feng (Ph.D., 2015,北京市优秀博士毕业生、清华大学优秀博士论文奖, Meta)

§  Zhaoxu Tu (M.E., 2015)

§  Wei Shen (Ph.D., 2014 Professor, 中国人工智能学会优秀博士论文奖、清华大学优秀博士论文奖,南开大学振兴计划)

§  Zhenhua Song (M.E., 2014)

§  Xianjun Zhang (M.E., 2014)

§  Chenwei Ran (B.E., 2014)

§  Hongda Ren (M.E., 2013)

§  Haijun Xia (B.E., 2013)

§  Junlin Lin (Visiting Master student from NTHU, 2013)

§  Lili Jiang (Ph.D., visiting from Lanzhou Univ., 2012, Associate Professor, Umea University, Sweden)

§  Xu Pu (M.E., 2012,清华大学优秀硕士论文奖)

§  Shuyong Chen (M.E., 2012)

§  Qingyan Yang (M.E., 2011)

§  ZhiJie He (B.E., 2011)

§  Jun Zhang (M.E., 2010)

§  Chuancong Gao (M.E., 2010,清华大学优秀硕士论文奖)

§  Yuzhou Zhang (Ph.D., 2010, co-advised with Prof. Lizhu Zhou)

§  Yiting Bian (B.E., 2009)

§  Yan Li (M.E., 2009)

§  Xiaoming Fan (M.E., 2009)

§  Chun Li (M.E., 2009,清华大学优秀硕士论文奖)

§  Zhiping Zeng (Ph.D., 2009, co-advised with Prof. Lizhu Zhou)

§  Jing Wang (B.E., 2008, Associate Professor,清华大学/北京市优秀本科毕业生,香港科技大学商学院)

§  Qingyan Yang (B.E., 2008)

§  Bing Lv (M.E., 2008)

§  Wei Fu (B.E., 2007)

Current research topics

§  Medical data mining: we mainly investigate the problems in areas such as AI for medical science, interpretable learning models and its applications for risk prediction and diagnosis of underlying diseases (e.g., Alzheimer's dementia, diabetes):

§  仇宇悦,金蔚,尚丽,褚姗姗,王添艺,姜宇涵,包嘉璐,王文君,李博,黄益炫,董立,毛晨晖,王建勇,高晶. 临床常用神经心理量表的纵向稳定性:一项横断面研究. 中华神经科杂志. 2025/58/1/17-25

§  Tianyi Wang, Li Shang, Chenhui Mao, Longze Sha, Liling Dong, Caiyan Liu, Dan Lei, Jie Li, Jie Wang,  Xinying Huang, Shanshan Chu, Wei Jin, Zhaohui Zhu, Huimin Sui, Bo Hou, Feng Feng, Bin Peng,  Liying Cui, , Jianyong Wang, Qi Xu, Jing Gao. Alzheimera's disease diagnosis among dementia patients via blood biomarker measurement based on the AT(N) system. Chinese Medical Journal. 2025 Jun 20;138(12):1505-1507.

§  Ning Liu, Yunsen Tang, Haitao Yuan, Hongtao Lv, Lili Jiang, Zhen Li, Wei Zhang, Jianyong Wang. Incomplete Multi-View Drug Recommendation via Multi-Level Representation Learning and Curriculum Learning. Proceedings of the 31st SIGKDD Conference on Knowledge Discovery and Data Mining, Toronto, ON, Canada, Aug. 3-7, 2025. (ACM SIGKDD 2025, ADS track)

§  Yixuan Huang, Zhenyu Li, Fangzhou Liu, Bo Li, Chenhui Mao, Liling Dong, Shanshan Chu, Wei Jin, Jianyong Wang*, Jing Gao*. Application of Machine Learning in EEG-based Dementia Diagnosis Classification and Differential Diagnosis. Journal of Alzheimer's disease. (https://doi.org/10.1177/13872877251360331, first published online July 28, 2025)

§  Ziyu Wang, Tengyu Pan, Zhenyu Li, Ji Wu, Xiuxing Li, Jianyong Wang. TROI: Cross-Subject Pretraining with Sparse Voxel Selection for Enhanced fMRI Visual Decoding. Proc. the 50th IEEE International Conference on Acoustics, Speech, and Signal Processing, Hyderabad, India, April 6-11, 2025.  PP: 1-5. (ICASSP 2025)

§  Zhuo Wang, Wei Zhang, Ning Liu, Jianyong Wang. Learning Interpretable Rules for Scalable Data Representation and Classification. IEEE TPAMI 2024论文提出了增强版的兼具高可解释性和高分类性能的分类模型RRL, 作为博士生王焯,Source code is available at: https://github.com/12wang3/rrl , the PDF can be found at https://arxiv.org/abs/2310.14336 Digital Object Identifier 10.1109/TPAMI.2023.3328881(IEEE TPAMI)

§  Bowen Dong, Zhuo Wang, Zhenyu Li, Zhichao Duan, Jiacheng Xu, Tengyu Pan, Rui Zhang, Ning Liu, Xiuxing Li, Jie Wang, Caiyan Liu, Liling Dong, Chenhui Mao, Jing Gao, Jianyong Wang*. Toward a stable and low-resource PLM-based medical diagnostic system via prompt tuning and MoE structure. Scientific Reports, August 3, 2023. (论文提出了基于医学知识图谱的输入模版构建方法和基于特征的混合专家机制作为博士生董博文)

§  Zhuo Wang, Jie Wang, Ning Liu, Caiyan Liu, Xiuxing Li, Liling Dong, Rui Zhang, Chenhui Mao, Zhichao Duan, Wei Zhang, Jing Gao*, Jianyong Wang*. Learning Cognitive-Test-Based Interpretable Rules for Prediction and Early Diagnosis of Dementia using Neural Networks. Journal of Alzheimer's disease, vol. 90, No. 2, PP: 609-624, Nov. 2022.

§  Jie Wang, Zhuo Wang, Ning Liu, Caiyan Liu, Chenhui Mao, Liling Dong,Jie Li, Xinying Huang, Dan Lei, Shanshan Chu, Jianyong Wang *, Jing Gao*. Random forest model in the diagnosis of dementia patients with normal Mini-Mental State Examination scores. Journal of Personalized Medicine, 2022, 12(1), 37 (https://doi.org/10.3390/jpm12010037). (4 January 2022)

§  Zhuo Wang, Wei Zhang, Ning Liu, Jianyong Wang. Scalable Rule-Based Representation Learning for Interpretable Classification. Proceedings of the Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS’21)

§  Zhuo Wang, Wei Zhang, Ning Liu, Jianyong Wang. Transparent Classification with Multilayer Logical Perceptrons and Random Binarization.  Proc. the Thirty-Fourth AAAI Conference on Artificial Intelligence, Feb. 7-12, New York, USA. AAAI'20.

§  Ning Liu, Pan Lu, Wei Zhang, Jianyong Wang. Knowledge-Aware Deep Dual Networks for Text-Based Mortality Prediction. IEEE ICDE’19.

§  Generation Models: we mainly study the problems related to generation models such as network architectures for LLMs, linear/sparse attention mechanisms, mixture of experts, diffusion models, quantization for efficient decoding, reinforcement Learning for enhancing the reasoning capabilities of LLMs, and so on:

§  Yutao Sun, TianzhuYe, Li Dong, Yuqing Xia, Jian Chen, Yizhao Gao, Shijie Cao, Jianyong Wang, Furu Wei.  Rectified Sparse Attention for Efficient Long-Sequence Generation. Findings of the 64th Annual Meeting of the Association for Computational Linguistics, San Diego, California, July 2-7, 2026. PP: -. (ACL 2026)

§  Yutao Sun, Zhenyu Li, Yike Zhang, Tengyu Pan, Bowen Dong, Yuyi Guo, Jianyong Wang. Efficient Attention Mechanisms for Large Language Models: A Survey. arXiv:2507.19595v1. July 25, 2025.

§  Yutao Sun, Hangbo Bao, Wenhui Wang, Zhiliang Peng, Li Dong, Shaohan Huang, Jianyong Wang, Furu Wei. Multimodal Latent Language Modeling with Next-Token Diffusion. arXiv:2412.08635. Dec. 11, 2024.

§  Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei. Retentive network: A successor to transformer for large language models. arXiv:2307.08621. July 17, 2023.

§  Yike Zhang, Zhiyuan He, Huiqiang Jiang, Chengruidong Zhang, Yuqing Yang, Jianyong Wang, Lili Qiu. LeanK: Learnable K Cache Channel Pruning for Efficient Decoding. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Suzhou, China, Nov. 5-9, 2025. PP:31111–31126. (EMNLP 2025)

§  Zhenyu Li, Yike Zhang, Tengyu Pan, Yutao Sun, Zhichao Duan, Junjie Fang, Rong Han, Zixuan Wang, Jianyong Wang. FocusLLM: Precise Understanding of Long Context by Dynamic Condensing. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria, July 27-Aug. 1, 2025. PP: 31087-31101. (ACL 2025)

§  Bowen Dong, Yilong Fan, Yutao Sun, Zhenyu Li, Tengyu Pan, Xun Zhou, Jianyong Wang. Maximum Score Routing For Mixture-of-Experts. Findings of the 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria, July 27-Aug. 1, 2025. PP: 12619-12632. (ACL 2025)

§  Tengyu Pan, Zhichao Duan, Zhenyu Li, Bowen Dong, Ning Liu, Xiuxing Li, Jianyong Wang. Negative Matters: Multi-Granularity Hard-Negative Synthesis and Anchor-Token-Aware Pooling for Enhanced Text Embeddings. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria, July 27-Aug. 1, 2025. PP: 31102-31118. (ACL 2025)

§  Weilin Zhao, Tengyu Pan, Xu Han, Yudi Zhang, Sun Ao, Yuxiang Huang, Kaihuo Zhang, Weilun Zhao, Yuxuan Li, Jie Zhou, Hao Zhou, Jianyong Wang, Maosong Sun, Zhiyuan Liu. FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria, July 27-Aug. 1, 2025. PP: 3909-3921. (ACL 2025)

§  Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei. You Only Cache Once: Decoder-Decoder Architectures for Language Models. Proceedings of the Thirty-eighth Annual Conference on Neural Information Processing Systems, Vancouver, Canada, Dec. 9-15, 2024. PP: 1-23  (NeurIPS 2024, Oral, acceptance rate 0.46%, 72 /15671)

§  Zhenyu Li, Sunqi Fan, Yu Gu, Xiuxing Li, Zhichao Duan, Bowen Dong, Ning Liu, Jianyong Wang. FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering. Proc. the Thirty-Eighth AAAI Conference on Artificial Intelligence, Vancouver, Canada, Feb. 20-27, 2024.  PP: 18608-18616. (AAAI'24, Source code is available at: https://github.com/leezythu/FlexKBQA, the PDF can be found at https://arxiv.org/abs/2308.12060). (Oral presentation, acceptance rate 2.2%)

 

Past research topics

§  Knowledge graph/recommender systems/text clustering: we mainly focused on the problems in this area such as entity disambiguation, relation extraction, entity linking, personalized recommender systems, short text clustering, and so on. Representative publications include:

§  Zhichao Duan, Tengyu Pan, Zhenyu Li, Xiuxing Li, Jianyong Wang. COMM: Concentrated Margin Maximization for Robust Document-Level Relation Extraction. Proc. the 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, Feb. 25-Mar. 4, 2025.  PP: 23841-23849. (AAAI 2025)

§  Minjia Wang, Fangzhou Liu, Xiuxing Li, Bowen Dong, Zhenyu Li, Tengyu Pan, Jianyong Wang. Bio-RFX: Refining Biomedical Extraction via Advanced Relation Classification and Structural Constraints. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA, Nov. 12-16, 2024. PP: 10524-10539 (EMNLP 2024)

§  Zhenyu Li, Sunqi Fan, Yu Gu, Xiuxing Li, Zhhichao Duan, Bowen Dong, Ning Liu, Jianyong Wang. FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering. Accepted to appear in Proc. the Thirty-Eighth AAAI Conference on Artificial Intelligence, Vancouver, Canada, Feb. 20-27, 2024.  PP: ***-***. (AAAI'24).

§  Zhenyu Li, Xiuxing Li, Zhichao Duan, Bowen Dong, Ning Liu, Jianyong Wang. Toward a Unified Framework for Unsupervised Complex Tabular Reasoning. Proceedings of the 39th IEEE International Conference on Data Engineering, Anaheim, California, USA, April 3-7, 2023. PP: 1691-1704 (IEEE ICDE’23).

§  Chenwei Ran , Wei Shen , Jianbo Gao , Yuhan Li , Jianyong Wang , Yantao Jia . Learning Entity Linking Features for Emerging Entities. IEEE Transactions on Knowledge and Data Engineering, Volume 35, Issue 7, July 2023. PP: 7088 - 7102 (IEEE TDKE)

§  Zhichao Duan, Xiuxing Li, Zhenyu Li, Zhuo Wang, Jianyong Wang. Not Just Plain Text! Fuel Document-Level Relation Extraction with Explicit Syntax Refinement and Subsentence Modeling. Findings of the 2022 Conference of Empirical Methods in Natural Language Processing, Abu Dhabi, Dec. 7-11, 2022.  PP: 1941-1951 (Findings of EMNLP 2022).

§  Xiuxing Li, Zhenyu Li, Zhengyan Zhang, Ning Liu, Haitao Yuan, Wei Zhang, Zhiyuan Liu, Jianyong Wang. Effective Few-Shot Named Entity Linking by Meta-Learning. Accepted to appear in Proceedings of the 38th IEEE International Conference on Data Engineering, Kuala Lumpur, Malaysia, May 9-12, 2022. (IEEE ICDE’22).

§  Pan Lu, Lei Ji, Wei Zhang, Nan Duan, Ming Zhou, Jianyong Wang. R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering. ACM SIGKDD'18.

§  Jianhua Yin, Daren Chao, Zhongkun Liu, Wei Zhang, Xiaohui Yu, Jianyong Wang. Model-based Clustering of Short Text Streams. ACM SIGKDD'18.

§  Wei Shen, Yinan Liu, Jianyong Wang. Predicting Named Entity Location Using Twitter. IEEE ICDE’18.

§  Chenwei Ran, Wei Shen, Jianyong Wang. An Attention Factor Graph Model for Tweet Entity Linking. WWW'18.

§  Pan Lu, Hongsheng Li, Wei Zhang, Jianyong Wang, Xiaogang Wang. Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering.  AAAI'18.

§  Wei Shen, Jiawei Han, Jianyong Wang, Xiaojie Yuan, Zhenglu Yang. SHINE+: A General Framework for Domain-Specific Entity Linking with Heterogeneous Information Networks. IEEE TDKEFebruary 2018. 

§  Wei Zhang, Jianyong Wang. Integrating Topic and Latent Factors for Scalable Personalized Review-based Rating Prediction. IEEE TDKENovember 2016. 

§  Jianhua Yin, Jianyong Wang. A Model-based Approach for Text Clustering with Outlier Detection. IEEE ICDE'16.

§  Wei Zhang, Quan Yuan, Jiawei Han, Jianyong Wang. Collaborative Multi-Level Embedding Learning from Reviews for Rating Prediction. IJCAI'16.

§  Jianhua Yin, Jianyong Wang. A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization. ACM SIGKDD'16.

§  Wei Shen, Jianyong Wang, Jiawei Han. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions.  IEEE TDKE (Vol. 27, No. 2, Feb. 2015, PP: 443-460).

§  Wei Feng, Chao Zhang, Wei Zhang, Jiawei Han, Jianyong Wang, Charu Aggarwal, Jianbin Huang. StreamCube: Hierarchical Spatio-temporal Hashtag Clustering for Event Exploration over the Twitter Stream. IEEE ICDE'15 (PP: 1561-1572).

    • Wei Zhang, Jianyong Wang. Prior-based Dual Additive Latent Dirichlet Allocation for User-item Connected Documents. IJCAI'15 (PP: 1405-1411).

§  Wei Zhang, Jianyong Wang. A Collective Bayesian Poisson Factorization Model for Cold-start Local Event Recommendation. ACM SIGKDD'15 (PP: 1455-1456).

§  Wei Feng, Jianyong Wang, Wei Zhang. We Can Learn Your #Hashtags: Connecting Tweets to Explicit Topics. Proc. IEEE ICDE'14. (PP:856-867)

§  Wei Shen, Jiawei Han, Jianyong Wang.  A Probabilistic Model for Linking Named Entities in Web Text with Heterogeneous Information Networks.  Proc. ACM SIGMOD'14.  (PP:1199-1210)

§  Jianhua Yin, Jianyong Wang. A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering. ACM SIGKDD14. (论文提出了短文本聚类算法GSDMM, 作为博士生尹建华. Python Implementation by Ryan Walker is available at https://github.com/rwalk/gsdmm, Implementation by Jianhua is available at https://github.com/jackyin12/GSDMM,  the PDF can be found at https://dbgroup.cs.tsinghua.edu.cn/wangjy/papers/KDD14-GSDMM.pdf )

§  Wei Zhang, Wei Feng, Jianyong Wang. Integrating Semantic Relatedness and Words’ Intrinsic Features for Keyword Extraction. Proc. IJCAI'13. (PP:2225-2231)

§  Wei Shen, Jianyong Wang, Ping Luo, Min Wang. Linking Named Entities in Tweets with Knowledge Base via User Interest Modeling. Proc. ACM SIGKDD'13. PP:(68-76)

§  Wei Zhang, Jianyong Wang, Wei Feng. Combining Latent factor Model with Location Features for Event-based Group Recommendation. Proc. ACM SIGKDD'13. (PP:910-918)

§  Wei Shen, Jianyong Wang, Ping Luo, and Min Wang. LINDEN: Linking Named Entities with Knowledge Base via Semantic Knowledge.  Proc.  WWW'12. (PP: 449-458)

§  Wei Feng, Jianyong Wang. Incorporating Heterogeneous Information for Personalized Tag Recommendation in Social Tagging Systems. ACM SIGKDD'12. (PP: 1276-1284)

§  Wei Shen, Jianyong Wang, Ping Luo, Min Wang. LIEGE: Link Entities in Web Lists with Knowledge Base. ACM SIGKDD'12. (PP: 1424-1432)

§  Jun Zhang, Xiaoming Fan, Jianyong Wang, Lizhu Zhou. Keyword-Propagation-Based Information Enriching and Noise Removal for Web News Videos. ACM SIGKDD'12. (Industry track, PP: 561-569)

§  Graph data mining: we investigate the problems in this area such as coherent subgraph mining, community detection in large networks, graph generator mining for classification, structural anonymization of graph data (joint work with IBM), and so on. Representative publications include:

§  Jianyong Wang, Zhiping Zeng, Lizhu Zhou. CLAN: An Algorithm for Mining Closed Cliques from Large Dense Graph Databases.  IEEE ICDE'06 (Full research paper, Article No. 73).

§  Zhiping Zeng, Jianyong Wang, Lizhu Zhou, George Karypis. Out-of-Core Coherent Closed Quasi-Clique Mining from Large Dense Graph Databases. ACM TODS, June 2007 (Volume 32, Issue 2, Article No. 13. 论文提出了紧凑子图挖掘算法Cocain*).

§  Yuzhou Zhang, Jianyong Wang, Yi Wang, Lizhu Zhou. Parallel Community Detection on Large Networks with Propinquity Dynamics. ACM SIGKDD'09 (PP:997-1005).

§  Zhiping Zeng, Jianyong Wang, Jun Zhang, Lizhu Zhou. FOGGER: An Algorithm for Graph Generator Discovery. EDBT'09 (PP: 517-528).

§  Sequence data mining: we mainly study the problems in this topic such as closed sequential pattern mining, gap-constrained sequential pattern mining, sequence generator pattern mining, summarization subsequence mining for clustering, sequential pattern based XML document clustering (joint work with IBM), and so on. Representative publications include:

§  Jianyong Wang, Jiawei Han. BIDE: Efficient Mining of Frequent Closed Sequences. IEEE ICDE04. (Most cited paper in ICDE 2004, 论文提出了闭合序列挖掘算法BIDE. Implementation by Chuancong Gao can be found at https://github.com/chuanconggao/PrefixSpan-py, Implementation by Cheng-Yuan Yu can be found at https://github.com/RonaldYu/bide-algorithm, the PDF can be found at https://ieeexplore.ieee.org/abstract/document/1319986 )

§  Jianyong Wang, Jiawei Han, Chun Li. Frequent Closed Sequence Mining without Candidate Maintenance. IEEE TKDE, August 2007 (PP: 1042-1056).

§  Chuancong Gao, Jianyong Wang, Yukai He, Lizhu Zhou. Efficient Mining of Frequent Sequence Generators. WWW'08 (Posters track, PP: 1051-1052, Best poster award).

§  Chun Li, Qingyan Yang, Jianyong Wang, Ming Li. Efficient Mining of Gap-Constrained Subsequences and its Various Applications. ACM Transactions on Knowledge Discovery from Data, Vol. 6, No.1, Article No. 2, March 2012. (ACM TKDD)

§  Uncertain data mining: we mainly work on problems in this topic such as frequent pattern discovery from uncertain data (joint work with IBM), and mining patterns for classifying uncertain data. Representative publications include:

§  Charu C. Aggarwal, Yan Li, Jianyong Wang, Jing Wang. Frequent Pattern Mining with Uncertain Data. ACM SIGKDD'09  (PP: 29-37 ).

§  Chuancong Gao, Jianyong Wang. Direct Mining of Discriminative Patterns for Classifying Uncertain Data. ACM SIGKDD'10 (PP: 861-870 ).

§  Stream data mining: we also work on problems on stream data mining. Representative publications include:

§  Charu C. Aggarwal. A Framework for Clustering Evolving Data Streams. VLDB03. (Most cited paper in VLDB 2003, 论文提出了流数据聚类框架CluStream, Implementation by Huawei Noahs Ark Lab can be found at https://github.com/huawei-noah/streamDM/blob/master/website/docs/CluStream.md, the PDF can be found at http://hanj.cs.illinois.edu/pdf/vldb03_clstm.pdf )

§  Chuancong  Gao, Jianyong  Wang. Efficient Itemset Generator Discovery Over a Stream Sliding Window. ACM CIKM'09 (PP: 355-364).

§  Chuancong Gao, Jianyong Wang, Qingyan Yang. Efficient Mining of Closed Sequential Patterns on Stream Sliding Window. IEEE ICDM'11 (PP: 1044-1049).