Data Mining Group of the Database Systems Laboratory at Tsinghua University


Group members  

§         Principal investigator

§         Jianyong Wang (Professor)

§         Current student members

§         Chenwei Ran (Ph.D. student)

§         Ning Liu (Ph.D. student)

§         Pan Lu (Master student)

§         Gang Chen (Master student)

§         Yuanquan Lu (Master student)

§         Zujiang Pan  (Master student)

§         Former student members

§         Jianhua Yin (Ph.D. , 2017,山东大学预聘系列教师)

§         Wei Zhang (Ph.D. , 2016,北京市优秀博士毕业生、清华大学优秀博士论文奖,华东师大紫江学者)

§         Yuda Zang (M.E., 2016)

§         Chao Wang (M.E., 2016)

§         Wei Feng (Ph.D., 2015,北京市优秀博士毕业生、清华大学优秀博士论文奖)

§         Zhaoxu Tu (M.E., 2015)

§         Wei Shen (Ph.D., 2014,中国人工智能学会优秀博士论文奖、清华大学优秀博士论文奖,南开大学振兴计划)

§         Zhenhua Song (M.E., 2014)

§         Xianjun Zhang (M.E., 2014)

§         Chenwei Ran (B.E., 2014)

§         Hongda Ren (M.E., 2013)

§         Haijun Xia (B.E., 2013)

§         Junlin Lin (Visiting Master student from NTHU, 2013)

§         Lili Jiang (Ph.D., visiting from Lanzhou Univ., 2012)

§         Xu Pu (M.E., 2012,清华大学优秀硕士论文奖)

§         Shuyong Chen (M.E., 2012)

§         Qingyan Yang (M.E., 2011)

§         ZhiJie He (B.E., 2011)

§         Jun Zhang (M.E., 2010)

§         Chuancong Gao (M.E., 2010,清华大学优秀硕士论文奖)

§         Yuzhou Zhang (Ph.D., 2010)

§         Yiting Bian (B.E., 2009)

§         Yan Li (M.E., 2009)

§         Xiaoming Fan (M.E., 2009)

§         Chun Li (M.E., 2009,清华大学优秀硕士论文奖)

§         Zhiping Zeng (Ph.D., 2009)

§         Jing Wang (B.E., 2008,清华大学优秀本科毕业生)

§         Bing Lv (M.E., 2008)

§         Wei Fu (B.E., 2007)

Current research topics

§         Deep search and personalized recommendation: we mainly focus on the problems in this area such as name disambiguation in both digital library and Web people search, (semantic) alias discovery,  entity and relationship extraction, entity linking, entity knowledge base, incremental RDF storage, indexing, and querying, keyword extraction, personalized recommender systems, and so on. Representative publications include:

§         Lili Jiang, Jianyong Wang, Ning An, Shengyuan Wang, Jian Zhan, Lian Li. GRAPE: A Graph-Based Framework for Disambiguating People Appearances in Web Search. IEEE ICDM'09 (PP:199-208)

§         Qingyan Yang, Ju Fan, Jianyong Wang, Lizhu Zhou. Personalizing Web Page Recommendation via Collaborative Filtering and Topic-Aware Markov Model. IEEE ICDM'10  (PP: 1145-1150)

§         Xiaoming Fan, Jianyong Wang, Xu Pu, Lizhu Zhou, Bing Lv. On Graph-based Name Disambiguation. ACM Journal of Data and Information Quality, February 2011. (ACM JDIQ, Vol. 2, No. 2, Article 10.)

§         Wei Shen, Jianyong Wang, Ping Luo, Min Wang, Conglei Yao. REACTOR: A Framework for Semantic Relation Extraction and Tagging over Enterprise Data.  WWW'11 (PP: 121-122)

§         Xu Pu, Jianyong Wang, Ping Luo, Min Wang. AWETO: Efficient Incremental Update and Querying for RDF Storage System. ACM CIKM'11 (PP: 24452448)

§         Wei Shen, Jianyong Wang, Ping Luo, and Min Wang. LINDEN: Linking Named Entities with Knowledge Base via Semantic Knowledge.  Proc.  WWW'12. (PP: 449-458)

§         Lili Jiang, Jianyong Wang, Ping Luo, Ning An, Min Wang. Towards Alias Detection Without String Similarity: an Active Learning based Approach. ACM SIGIR'12. (poster paper, PP: 1155-1156)

§         Wei Feng, Jianyong Wang. Incorporating Heterogeneous Information for Personalized Tag Recommendation in Social Tagging Systems. ACM SIGKDD'12. (PP: 1276-1284)

§         Wei Shen, Jianyong Wang, Ping Luo, Min Wang. LIEGE: Link Entities in Web Lists with Knowledge Base. ACM SIGKDD'12. (PP: 1424-1432)

§         Jun Zhang, Xiaoming Fan, Jianyong Wang, Lizhu Zhou. Keyword-Propagation-Based Information Enriching and Noise Removal for Web News Videos. ACM SIGKDD'12. (Industry track, PP: 561-569)

§         Wei Shen, Jianyong Wang, Ping Luo, Min Wang. A Graph-Based Approach for Ontology Population with Named Entities. Proc. ACM CIKM'12. (PP: 345-354.)

§         Wei Feng, Jianyong Wang. Retweet or not? Personalized Tweet Re-ranking. ACM WSDM'13. (PP: 577-586)

§         Wei Zhang, Wei Feng, Jianyong Wang. Integrating Semantic Relatedness and Words Intrinsic Features for Keyword Extraction. Proc. IJCAI'13. (PP:2225-2231)

§         Wei Shen, Jianyong Wang, Ping Luo, Min Wang. Linking Named Entities in Tweets with Knowledge Base via User Interest Modeling. Proc. ACM SIGKDD'13. PP:(68-76)

§         Wei Zhang, Jianyong Wang, Wei Feng. Combining Latent factor Model with Location Features for Event-based Group Recommendation. Proc. ACM SIGKDD'13. (PP:910-918)

§         Lili Jiang, Ping Luo, Jianyong Wang, Yuhong Xiong, Bingduan Lin, Min Wang, Ning An. GRIAS: an Entity-Relation Graph based Framework for Discovering Entity Aliases. Proc. IEEE ICDM'13. (PP:310-319)

§         Wei Feng, Jianyong Wang, Wei Zhang. We Can Learn Your #Hashtags: Connecting Tweets to Explicit Topics. Proc. IEEE ICDE'14. (PP:856-867)

§         Wei Shen, Jiawei Han, Jianyong Wang.  A Probabilistic Model for Linking Named Entities in Web Text with Heterogeneous Information Networks.  Proc. ACM SIGMOD'14.  (PP:1199-1210)

§         Jianhua Yin, Jianyong Wang. A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering. Proc. ACM SIGKDD'14. (PP: 233-242)

§         Wei Shen, Jianyong Wang, Jiawei Han. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions.  IEEE TDKE (Vol. 27, No. 2, Feb. 2015, PP: 443-460).

§         Wei Feng, Chao Zhang, Wei Zhang, Jiawei Han, Jianyong Wang, Charu Aggarwal, Jianbin Huang. StreamCube: Hierarchical Spatio-temporal Hashtag Clustering for Event Exploration over the Twitter Stream. IEEE ICDE'15 (PP: 1561-1572).

§         Wei Zhang, Jianyong Wang. Prior-based Dual Additive Latent Dirichlet Allocation for User-item Connected Documents. IJCAI'15 (PP: 1405-1411).

§         Wei Zhang, Jianyong Wang. A Collective Bayesian Poisson Factorization Model for Cold-start Local Event Recommendation. ACM SIGKDD'15 (PP: 1455-1456).

§         Wei Zhang, Jianyong Wang. A Location and Time Aware Social Collaborative Retrieval Approach for New Successive Point-of-Interest Recommendation. ACM CIKM'15 (PP: 1221-1230).

§         Chenwei Ran, Wei Shen, Jianyong Wang, Xuan Zhu. Domain-specific knowledge base enrichment using Wikipedia tables.  IEEE ICDM'15.

§         Jianhua Yin, Jianyong Wang. A Model-based Approach for Text Clustering with Outlier Detection. Accepted to appear in IEEE ICDE'16 (PP: 625-636).

§         Wei Zhang, Quan Yuan, Jiawei Han, Jianyong Wang. Collaborative Multi-Level Embedding Learning from Reviews for Rating Prediction. IJCAI'16 (PP: 2986-2992).

§         Jianhua Yin, Jianyong Wang. A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization. Accepted to appear in ACM SIGKDD'16.

§         Wei Zhang, Jianyong Wang. Integrating Topic and Latent Factors for Scalable Personalized Review-based Rating Prediction. Accepted to appear in IEEE TDKE. 


Past research topics

§         Graph data mining: we investigate the problems in this area such as coherent subgraph mining, community detection in large networks, graph generator mining for classification, structural anonymization of graph data (joint work with IBM), and so on. Representative publications include:

§         Jianyong Wang, Zhiping Zeng, Lizhu Zhou. CLAN: An Algorithm for Mining Closed Cliques from Large Dense Graph Databases.  IEEE ICDE'06 (Full research paper, Article No. 73).

§         Zhiping Zeng, Jianyong Wang, Lizhu Zhou, George Karypis. Out-of-Core Coherent Closed Quasi-Clique Mining from Large Dense Graph Databases. ACM TODS, June 2007 (Volume 32, Issue 2, Article No. 13).

§         Yuzhou Zhang, Jianyong Wang, Zhiping Zeng, Lizhu Zhou. Parallel Mining of Closed Quasi-Cliques. IEEE IPDPS'08 (Full research paper,  Article No. 2).

§         Yuzhou Zhang, Jianyong Wang, Yi Wang, Lizhu Zhou. Parallel Community Detection on Large Networks with Propinquity Dynamics. ACM SIGKDD'09 (PP:997-1005).

§         Zhiping Zeng, Jianyong Wang, Jun Zhang, Lizhu Zhou. FOGGER: An Algorithm for Graph Generator Discovery. EDBT'09 (PP: 517-528).

§         Chun Li, Charu Aggarwal, Jianyong Wang. On Anonymization of Multi-graphs. SIAM SDM'11(PP: 711-722).

§         Sequence data mining: we mainly study the problems in this topic such as closed sequential pattern mining, gap-constrained sequential pattern mining, sequence generator pattern mining, summarization subsequence mining for clustering, sequential pattern based XML document clustering (joint work with IBM), and so on. Representative publications include:

§         Jianyong Wang, Jiawei Han, Chun Li. Frequent Closed Sequence Mining without Candidate Maintenance. IEEE TKDE, August 2007 (PP: 1042-1056).

§         Charu C. Aggarwal, Na Ta, Jianyong Wang, Jianhua Feng, and Mohammed J. Zaki. XProj: A Framework for Projected Structural Clustering of XML Documents. ACM SIGKDD'07 (PP: 46-55).

§         Chuancong Gao, Jianyong Wang, Yukai He, Lizhu Zhou. Efficient Mining of Frequent Sequence Generators. WWW'08 (Posters track, PP: 1051-1052).

§         Jianyong Wang, Yuzhou Zhang, Lizhu Zhou, George Karypis, Charu C. Aggarwal. CONTOUR: An Efficient Algorithm for Discovering Discriminating Subsequences. Int. J. Data Mining and Knowledge Discovery, Feb. 2009 (Vol. 18, No. 1, PP: 1-29).

§         Chun Li, Qingyan Yang, Jianyong Wang, Ming Li. Efficient Mining of Gap-Constrained Subsequences and its Various Applications. ACM Transactions on Knowledge Discovery from Data, Vol. 6, No.1, Article No. 2, March 2012. (ACM TKDD)

§         Uncertain data mining: we mainly work on problems in this topic such as frequent pattern discovery from uncertain data (joint work with IBM), and mining patterns for classifying uncertain data. Representative publications include:

§         Charu C. Aggarwal, Yan Li, Jianyong Wang, Jing Wang. Frequent Pattern Mining with Uncertain Data. ACM SIGKDD'09  (PP: 29-37 ).

§         Chuancong Gao, Jianyong Wang. Direct Mining of Discriminative Patterns for Classifying Uncertain Data. ACM SIGKDD'10 (PP: 861-870 ).

§         Other data mining topics: we also work on problems such as stream data mining. Representative publications include:

§         Chuancong  Gao, Jianyong  Wang. Efficient Itemset Generator Discovery Over a Stream Sliding Window. ACM CIKM'09 (PP: 355-364).

§         Chuancong Gao, Jianyong Wang, Qingyan Yang. Efficient Mining of Closed Sequential Patterns on Stream Sliding Window. IEEE ICDM'11 (PP: 1044-1049).