Data Mining
Group of the Database Systems Laboratory at Group
members § Principal
investigator § Jianyong
Wang (Professor) § Current
members § Jiacheng Xu (Ph.D. student) § Zhichao Duan (Ph.D. student) § Cong Han (Ph.D. student) § Zhenyu Li (Ph.D. student) § Yunfei Yang (Ph.D. student) § Bowen Dong (Ph.D. student) § Yutao Sun (Ph.D. Student) § Tingting Li (Ph.D. Student) § Tengyu Pan (Master Student) § Fangzhou Liu (Ph.D. Student) § Junyan Zhang (Ph.D. Student) § Yike Zhang (Bachelor Student) § Yuyi Guo (Bachelor Student) § Former student members § Fangzhou Liu (B.E., 2024,清华大学优良毕业生称号) § MinJia Wang (B.E., 2024,清华大学优良毕业生称号) § Zhuo
Wang (Ph.D.,2023) § Yutao Sun (B.E., 2023,清华大学优秀本科论文奖) § Tengyu Pan (B.E., 2023,清华大学优良毕业生称号) § Xiuxing Li
(Ph.D., 2022, Associate Professor, 北京理工大学) § Rui Zhang(M.E., 2022) § Bowen Dong (B.E., 2022) § Zhongkai He (B.E., 2022) § Ning
Liu (Ph.D., Assistant Professor, 山东大学) § Zhenyu Li (B.E., 2021) § Jianyuan Lu (Posdoc Researcher) § Chenwei Ran (Ph.D., 2020) § Zujiang Pan (M.E., 2020) § Jiacheng Xu (B.E., 2020) § Yuanquan Lu (M.E., 2019) § Long Guo (B.E., 2019) § Yifan Li (B.E., 2019) § Gang Chen (M.E., 2018) § Pan Lu (M.E., 2018,清华大学优秀硕士论文奖) § Xingzhi Niu
(B.E., 2018) § Junyi
Fu (B.E., 2018) § Jianhua
Yin (Ph.D. , 2017,Tenured Associate Professor, 山东大学) § Xinding Wei
(B.E., 2017) § Wei Zhang (Ph.D. , 2016,Professor,
北京市优秀博士毕业生、清华大学优秀博士论文奖,华东师大紫江学者) § Yuda
Zang (M.E., 2016) § Chao
Wang (M.E., 2016) § Wei
Feng (Ph.D., 2015,北京市优秀博士毕业生、清华大学优秀博士论文奖) § Zhaoxu Tu
(M.E., 2015) § Wei
Shen (Ph.D., 2014, Professor, 中国人工智能学会优秀博士论文奖、清华大学优秀博士论文奖,南开大学振兴计划) § Zhenhua
Song (M.E., 2014) § Xianjun
Zhang (M.E., 2014) § Chenwei Ran
(B.E., 2014) § Hongda Ren
(M.E., 2013) § Haijun Xia
(B.E., 2013) § Junlin Lin
(Visiting Master student from NTHU, 2013) § Lili
Jiang (Ph.D., visiting from Lanzhou Univ., 2012, Associate Professor, Umea
University, Sweden) § Xu Pu
(M.E., 2012,清华大学优秀硕士论文奖) § Shuyong Chen
(M.E., 2012) § Qingyan Yang
(M.E., 2011) § ZhiJie He
(B.E., 2011) § Jun
Zhang (M.E., 2010) § Chuancong Gao
(M.E., 2010,清华大学优秀硕士论文奖) § Yuzhou
Zhang (Ph.D., 2010) § Yiting
Bian (B.E., 2009) § Yan
Li (M.E., 2009) § Xiaoming
Fan (M.E., 2009) § Chun
Li (M.E., 2009,清华大学优秀硕士论文奖) § Zhiping Zeng
(Ph.D., 2009) § Jing
Wang (B.E., 2008, Associate
Professor,清华大学/北京市优秀本科毕业生,香港科技大学商学院) § Qingyan Yang
(B.E., 2008) § Bing Lv (M.E., 2008) § Wei
Fu (B.E., 2007) Current
research topics § Knowledge
graph and Medical data mining: we mainly focus on the problems
in this area such as medical data mining, interpretable learning models,
entity disambiguation, relation extraction, entity linking, personalized
recommender systems, short text clustering, and so on. Representative publications
include: § Zhenyu Li, Sunqi Fan, Yu Gu, Xiuxing Li, Zhhichao Duan,
Bowen Dong, Ning Liu, Jianyong Wang. FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge
Base Question Answering. Accepted to appear in
Proc. the Thirty-Eighth AAAI Conference on Artificial Intelligence,
Vancouver, Canada, Feb. 20-27, 2024.
PP: ***-***. (AAAI'24). §
Zhuo
Wang, Wei Zhang, Ning Liu, Jianyong Wang. Learning Interpretable Rules for Scalable Data
Representation and Classification. IEEE
TPAMI 2024(论文提出了增强版的兼具高可解释性和高分类性能的分类模型RRL, 一作为博士生王焯,Source code is
available at: https://github.com/12wang3/rrl , the PDF can be found at https://arxiv.org/abs/2310.14336 ,Digital Object Identifier 10.1109/TPAMI.2023.3328881)(IEEE TPAMI) §
Bowen
Dong, Zhuo Wang, Zhenyu Li, Zhichao Duan, Jiacheng
Xu, Tengyu Pan, Rui Zhang, Ning Liu, Xiuxing Li, Jie Wang, Caiyan Liu, Liling Dong, Chenhui Mao, Jing Gao, Jianyong Wang*. Toward a stable and low-resource PLM-based medical
diagnostic system via prompt tuning and MoE
structure. Scientific Reports, August 3,
2023. (论文提出了基于医学知识图谱的输入模版构建方法和基于特征的混合专家机制,一作为博士生董博文) § Zhenyu Li, Xiuxing Li, Zhichao Duan, Bowen Dong, Ning Liu, Jianyong Wang. Toward a Unified Framework for
Unsupervised Complex Tabular Reasoning.
Proceedings of the 39th IEEE International Conference on Data Engineering, Anaheim,
California, USA, April 3-7, 2023. PP: 1691-1704 (IEEE ICDE’23). § Chenwei Ran, Wei Shen, Jianbo Gao, Yuhan Li, Jianyong Wang, Yantao Jia. Learning Entity Linking Features for Emerging Entities. IEEE Transactions on Knowledge and Data Engineering, Volume 35, Issue 7, July 2023. PP: 7088 - 7102 (IEEE TDKE) § Zhuo
Wang, Jie Wang, Ning Liu, Caiyan Liu, Xiuxing Li,
Liling Dong, Rui Zhang, Chenhui Mao, Zhichao Duan, Wei Zhang, Jing Gao*, Jianyong
Wang*. Learning Cognitive-Test-Based Interpretable Rules for
Prediction and Early Diagnosis of Dementia using Neural Networks. Journal of Alzheimer's disease, vol. 90, No. 2,
PP: 609-624, Nov. 2022. § Jie Wang, Zhuo Wang, Ning Liu, Caiyan Liu, Chenhui
Mao, Liling Dong,Jie Li, Xinying Huang, Dan Lei,
Shanshan Chu, Jianyong Wang *, Jing Gao*. Random forest model in the
diagnosis of dementia patients with normal Mini-Mental State Examination
scores. Journal of Personalized Medicine, 2022, 12(1), 37 (https://doi.org/10.3390/jpm12010037). (4 January 2022) § Zhichao Duan, Xiuxing
Li, Zhenyu Li, Zhuo Wang, Jianyong Wang. Not
Just Plain Text! Fuel Document-Level Relation Extraction with Explicit Syntax
Refinement and Subsentence Modeling. Findings of the 2022 Conference
of Empirical Methods in Natural Language Processing, Abu Dhabi, Dec. 7-11,
2022. PP: 1941-1951 (Findings of EMNLP 2022). § Xiuxing Li, Zhenyu Li, Zhengyan Zhang, Ning Liu, Haitao
Yuan, Wei Zhang, Zhiyuan Liu, Jianyong Wang. Effective Few-Shot Named Entity
Linking by Meta-Learning. Accepted to appear in Proceedings of the 38th IEEE
International Conference on Data Engineering, Kuala
Lumpur, Malaysia, May 9-12, 2022.
(IEEE ICDE’22). § Zhuo Wang, Wei Zhang, Ning Liu, Jianyong
Wang. Scalable Rule-Based
Representation Learning for Interpretable Classification. Accepted to appear in Proceedings of
the Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS’21) § Zhuo Wang, Wei Zhang, Ning Liu, Jianyong Wang. Transparent
Classification with Multilayer Logical Perceptrons
and Random Binarization. Accepted
to appear in Proc. the Thirty-Fourth AAAI Conference on Artificial
Intelligence, Feb. 7-12, New York, USA. AAAI'20. § Ning
Liu, Pan Lu, Wei Zhang, Jianyong Wang. Knowledge-Aware
Deep Dual Networks for Text-Based Mortality Prediction. IEEE
ICDE’19. § Pan Lu, Lei Ji, Wei Zhang, Nan Duan, Ming Zhou,
Jianyong Wang. R-VQA: Learning Visual Relation Facts with Semantic
Attention for Visual Question Answering. ACM
SIGKDD'18. § Jianhua
Yin, Daren Chao, Zhongkun Liu, Wei Zhang, Xiaohui
Yu, Jianyong Wang. Model-based Clustering of Short Text Streams. ACM SIGKDD'18. § Wei Shen, Yinan Liu, Jianyong Wang. Predicting
Named Entity Location Using Twitter. IEEE ICDE’18. § Chenwei Ran, Wei Shen, Jianyong Wang. An
Attention Factor Graph Model for Tweet Entity Linking. WWW'18. § Pan Lu, Hongsheng Li, Wei Zhang, Jianyong Wang,
Xiaogang Wang. Co-attending Free-form Regions and Detections with
Multi-modal Multiplicative Feature Embedding for Visual Question Answering. AAAI'18. § Wei
Shen, Jiawei Han, Jianyong Wang, Xiaojie Yuan, Zhenglu
Yang. SHINE+: A General Framework for Domain-Specific Entity Linking with
Heterogeneous Information Networks. IEEE TDKE,February
2018. § Wei Zhang,
Jianyong Wang. Integrating Topic and Latent Factors for Scalable Personalized
Review-based Rating Prediction. IEEE TDKE,November
2016. § Jianhua
Yin, Jianyong Wang. A Model-based Approach for Text Clustering with Outlier
Detection. IEEE ICDE'16. § Wei Zhang, Quan Yuan, Jiawei Han, Jianyong Wang. Collaborative
Multi-Level Embedding Learning from Reviews for Rating Prediction. IJCAI'16. § Jianhua Yin, Jianyong Wang. A
Text Clustering Algorithm Using an Online Clustering Scheme for
Initialization. ACM SIGKDD'16. § Wei Shen, Jianyong Wang, Jiawei Han. Entity
Linking with a Knowledge Base: Issues, Techniques, and Solutions. IEEE TDKE (Vol. 27, No. 2, Feb. 2015, PP:
443-460). § Wei Feng, Chao Zhang, Wei Zhang, Jiawei Han, Jianyong
Wang, Charu Aggarwal, Jianbin Huang. StreamCube:
Hierarchical Spatio-temporal Hashtag Clustering for
Event Exploration over the Twitter Stream. IEEE
ICDE'15 (PP: 1561-1572).
§ Wei
Zhang, Jianyong Wang. A Collective Bayesian Poisson Factorization Model for
Cold-start Local Event Recommendation. ACM
SIGKDD'15 (PP: 1455-1456). § Wei Feng, Jianyong Wang, Wei Zhang. We
Can Learn Your #Hashtags: Connecting Tweets to Explicit Topics. Proc. IEEE ICDE'14. (PP:856-867) § Wei Shen, Jiawei Han, Jianyong Wang. A
Probabilistic Model for Linking Named Entities in Web Text with Heterogeneous
Information Networks. Proc. ACM
SIGMOD'14. (PP:1199-1210) § Jianhua Yin, Jianyong Wang. A Dirichlet Multinomial Mixture
Model-based Approach for Short Text Clustering. ACM SIGKDD’14. (论文提出了短文本聚类算法GSDMM, 一作为博士生尹建华. Python Implementation
by Ryan Walker is available at https://github.com/rwalk/gsdmm,
Implementation by Jianhua is available at https://github.com/jackyin12/GSDMM, the PDF can be found at https://dbgroup.cs.tsinghua.edu.cn/wangjy/papers/KDD14-GSDMM.pdf
) § Wei Zhang, Wei Feng, Jianyong Wang. Integrating
Semantic Relatedness and Words’ Intrinsic Features for Keyword Extraction.
Proc. IJCAI'13. (PP:2225-2231) § Wei Shen, Jianyong Wang, Ping Luo, Min Wang. Linking
Named Entities in Tweets with Knowledge Base via User Interest Modeling.
Proc. ACM SIGKDD'13. PP:(68-76) § Wei Zhang, Jianyong Wang, Wei Feng. Combining
Latent factor Model with Location Features for Event-based Group
Recommendation. Proc. ACM SIGKDD'13.
(PP:910-918) § Wei
Shen, Jianyong Wang, Ping Luo, and Min Wang. LINDEN: Linking Named Entities
with Knowledge Base via Semantic Knowledge. Proc. WWW'12. (PP: 449-458) § Wei
Feng, Jianyong Wang. Incorporating Heterogeneous Information for Personalized
Tag Recommendation in Social Tagging Systems. ACM
SIGKDD'12. (PP: 1276-1284) § Wei Shen, Jianyong Wang, Ping Luo, Min Wang. LIEGE:
Link Entities in Web Lists with Knowledge Base.
ACM SIGKDD'12. (PP: 1424-1432) § Jun
Zhang, Xiaoming Fan, Jianyong Wang, Lizhu Zhou.
Keyword-Propagation-Based Information Enriching and Noise Removal for Web
News Videos. ACM SIGKDD'12. (Industry
track, PP: 561-569) Past research topics § Graph data mining: we investigate the problems in this area
such as coherent subgraph mining, community detection in large networks,
graph generator mining for classification, structural anonymization of graph
data (joint work with IBM), and so on. Representative publications include: § Jianyong Wang, Zhiping Zeng, Lizhu Zhou. CLAN:
An Algorithm for Mining Closed Cliques from Large Dense Graph
Databases. IEEE ICDE'06 (Full
research paper, Article No. 73). § Zhiping Zeng, Jianyong Wang, Lizhu Zhou, George
Karypis. Out-of-Core
Coherent Closed Quasi-Clique Mining from Large Dense Graph Databases. ACM TODS, June 2007 (Volume 32, Issue 2, Article No. 13. 论文提出了紧凑子图挖掘算法Cocain*). § Yuzhou Zhang, Jianyong Wang, Yi Wang, Lizhu
Zhou. Parallel Community Detection on Large Networks with Propinquity
Dynamics. ACM SIGKDD'09 (PP:997-1005). § Zhiping Zeng, Jianyong Wang, Jun Zhang, Lizhu Zhou. FOGGER:
An Algorithm for Graph Generator Discovery. EDBT'09
(PP: 517-528). § Sequence data mining: we mainly study the problems in this topic
such as closed sequential pattern mining, gap-constrained sequential pattern
mining, sequence generator pattern mining, summarization subsequence mining
for clustering, sequential pattern based XML document clustering (joint work
with IBM), and so on. Representative publications include: § Jianyong Wang, Jiawei Han. BIDE:
Efficient Mining of Frequent Closed Sequences. IEEE ICDE’04. (Most cited paper in ICDE 2004, 论文提出了闭合序列挖掘算法BIDE. Implementation by Chuancong
Gao can be found at https://github.com/chuanconggao/PrefixSpan-py, Implementation by Cheng-Yuan Yu can be
found at https://github.com/RonaldYu/bide-algorithm, the PDF can be found at https://ieeexplore.ieee.org/abstract/document/1319986
) § Jianyong
Wang, Jiawei Han, Chun Li. Frequent Closed Sequence Mining without Candidate
Maintenance. IEEE TKDE, August 2007
(PP: 1042-1056). § Chuancong Gao, Jianyong Wang, Yukai
He, Lizhu Zhou. Efficient Mining of Frequent
Sequence Generators. WWW'08 (Posters track, PP: 1051-1052, Best poster award). § Chun Li, Qingyan
Yang, Jianyong Wang, Ming Li. Efficient Mining of Gap-Constrained
Subsequences and its Various Applications. ACM Transactions on Knowledge
Discovery from Data, Vol. 6, No.1, Article No. 2, March 2012. (ACM TKDD) § Uncertain data mining: we mainly work on problems in this topic
such as frequent pattern discovery from uncertain data (joint work with IBM),
and mining patterns for classifying uncertain data. Representative
publications include: § Charu C. Aggarwal, Yan
Li, Jianyong Wang, Jing Wang. Frequent
Pattern Mining with Uncertain Data. ACM
SIGKDD'09 (PP: 29-37 ). § Chuancong Gao, Jianyong Wang. Direct Mining of Discriminative
Patterns for Classifying Uncertain Data. ACM
SIGKDD'10 (PP: 861-870 ). § Stream data mining: we also work on problems on stream data
mining. Representative publications include: § Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu. A Framework for Clustering Evolving Data
Streams. VLDB’03. (Most cited paper in VLDB 2003, 论文提出了流数据聚类框架CluStream, Implementation by Huawei Noah’s Ark Lab can be found at https://github.com/huawei-noah/streamDM/blob/master/website/docs/CluStream.md, the PDF can be found at http://hanj.cs.illinois.edu/pdf/vldb03_clstm.pdf ) § Chuancong Gao, Jianyong Wang. Efficient Itemset
Generator Discovery Over a Stream Sliding Window. ACM
CIKM'09 (PP: 355-364). § Chuancong Gao, Jianyong Wang, Qingyan
Yang. Efficient Mining of Closed Sequential Patterns on Stream Sliding
Window. IEEE ICDM'11 (PP:
1044-1049). |