0
喜欢
0
书签
声明论文
Transferring topical knowledge from auxiliary long texts for short text clustering   
摘  要:   With the rapid growth of social Web applications such as Twitter and online advertisements, the task of understanding short texts is becoming more and more important. Most traditional text mining techniques are designed to handle long text documents. For short text messages, many of the existing techniques are not effective due to the sparseness of text representations. To understand short messages, we observe that it is often possible to find topically related long texts, which can be utilized as the auxiliary data when mining the target short texts data. In this article, we present a novel approach to cluster short text messages via transfer learning from auxiliary long text data. We show that while some previous work exists that enhance short text clustering with related long texts, most of them ignore the semantic and topical inconsistencies between the target and auxiliary data and hurt the clustering performance. To accommodate the possible inconsistency between source and target data, we propose a novel topic model - Dual Latent Dirichlet Allocation (DLDA) model, which jointly learns two sets of topics on short and long texts and couples the topic parameters to cope with the potential inconsistency between data sets. We demonstrate through large-scale clustering experiments on both advertisements and Twitter data that we can obtain superior performance over several state-of-art techniques for clustering short text documents.
发  表:   International Conference on Information and Knowledge Management  2011

论文统计图
共享有4个版本
 [展开全部版本] 

Bibtex
创新指数 
阅读指数 
重现指数 
论文点评
还没有人点评哦
声明该论文的用户(2)

Feedback
Feedback
Feedback
我想反馈:
排行榜