JOURNAL OF CHINA UNIVERSITIES OF POSTS AND TELECOM ›› 2016, Vol. 23 ›› Issue (5): 40-46.doi: 10.1016/S1005-8885(16)60056-0

• Artificial intelligence • Previous Articles     Next Articles

Mining microblog user interests based on TextRank with TF-IDF factor

Tu Shouzhong, Huang Minlie   

  • Received:2016-04-29 Revised:2016-09-29 Online:2016-10-30 Published:2016-10-26
  • Contact: Tu Shouzhong E-mail:aquares@163.com
  • Supported by:
    the National Natural Science Foundation of China (61272227).

Abstract: It is of great value and significance to model the interests of microblog user in terms of business and sociology. This paper presents a framework for mining and analyzing personal interests from microblog text with a new algorithm which integrates term frequency-inverse document frequency (TF-IDF) with TextRank. Firstly, we build a three-tier category system of user interest based on Wikipedia. In order to obtain the keywords of interest, we preprocess the posts, comments and reposts in different categories to select the keywords which appear both in the category system and microblogs. We then assign weight to each category and calculate the weight of keyword to get TF-IDF factors. Finally we score the ranking of each keyword by the TextRank algorithm with TF-IDF factors. Experiments on real Sina microblog data demonstrate that the precision of our approach significantly outperforms other existing methods.

Key words: microblog, interest feature, TF-IDF interest mining, TextRank

CLC Number: