Mining microblog user interests based on TextRank with TF-IDF factor

doi:10.1016/S1005-8885(16)60056-0

中国邮电高校学报(英文) ›› 2016, Vol. 23 ›› Issue (5): 40-46.doi: 10.1016/S1005-8885(16)60056-0

• Artificial Intelligence • 上一篇下一篇

Mining microblog user interests based on TextRank with TF-IDF factor

屠守中,黄民烈

清华大学

收稿日期:2016-04-29 修回日期:2016-09-29 出版日期:2016-10-30 发布日期:2016-10-26
通讯作者: 屠守中 E-mail:aquares@163.com
基金资助:
the National Natural Science Foundation of China (61272227).

Mining microblog user interests based on TextRank with TF-IDF factor

Tu Shouzhong, Huang Minlie

Received:2016-04-29 Revised:2016-09-29 Online:2016-10-30 Published:2016-10-26
Contact: Tu Shouzhong E-mail:aquares@163.com
Supported by:
the National Natural Science Foundation of China (61272227).

摘要/Abstract

摘要： It is of great value and significance to model the interests of microblog user in terms of business and sociology. This paper presents a framework for mining and analyzing personal interests from microblog text with a new algorithm which integrates term frequency-inverse document frequency (TF-IDF) with TextRank. Firstly, we build a three-tier category system of user interest based on Wikipedia. In order to obtain the keywords of interest, we preprocess the posts, comments and reposts in different categories to select the keywords which appear both in the category system and microblogs. We then assign weight to each category and calculate the weight of keyword to get TF-IDF factors. Finally we score the ranking of each keyword by the TextRank algorithm with TF-IDF factors. Experiments on real Sina microblog data demonstrate that the precision of our approach significantly outperforms other existing methods.

关键词: microblog, interest feature, TF-IDF interest mining, TextRank

Abstract: It is of great value and significance to model the interests of microblog user in terms of business and sociology. This paper presents a framework for mining and analyzing personal interests from microblog text with a new algorithm which integrates term frequency-inverse document frequency (TF-IDF) with TextRank. Firstly, we build a three-tier category system of user interest based on Wikipedia. In order to obtain the keywords of interest, we preprocess the posts, comments and reposts in different categories to select the keywords which appear both in the category system and microblogs. We then assign weight to each category and calculate the weight of keyword to get TF-IDF factors. Finally we score the ranking of each keyword by the TextRank algorithm with TF-IDF factors. Experiments on real Sina microblog data demonstrate that the precision of our approach significantly outperforms other existing methods.

Key words: microblog, interest feature, TF-IDF interest mining, TextRank

中图分类号:

TP391

Tu Shouzhong, Huang Minlie . Mining microblog user interests based on TextRank with TF-IDF factor[J]. JOURNAL OF CHINA UNIVERSITIES OF POSTS AND TELECOM, 2016, 23(5): 40-46.

参考文献

1. Lerman K, Ghosh, R. Information contagion: an empirical study of spread of news on Digg and Twitter social networks. Proceedings of the 4th International AAAI Conference on Weblogs and Social Media (ICWSM10), May 23-26, 2010, Washington, DC, USA. Menlo Park, CA , USA: AAAI, 2010: 90-97

2. Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes twitter users: real-time event detection by social sensors. Proceedings of the 19th International Conference on World Wide Web (WWW’10), Apr 26-30, 2010, Raleigh, NC, USA. New York, NY, USA: ACM, 2010: 851-860

3. Kim J, Choi J, Ko B, et al. Extracting user interests on facebook. International Journal of Distributed Sensor Networks, 2014, (2): 1-5

4. Yu M, Han X, Gou X L, et al. Content-based social network user interest tag extraction. International Journal of Database Theory and Application, 2015, 8(2): 107-118

5. Tang J Y, Liu Z Y, Sun M S. Measuring and visualizing interest similarity between microblog users. Proceedings of the 14th International Conference on Web-Age Information Managemen (WAIM’13), Jun 14-16, 2013, Beidaihe, China. Berlin, Germany: Springer-Verlag, 2013: 478-489

6. Michelson M, Macskassy S. Discovering users’ topics of interest on twitter: a first look. Proceedings of the 4th Workshop on Analytics for Noisy Unstructured Text Data (AND’10), Oct 26, 2010, Toronto, Canada. New York, NY, USA: ACM，2010: 73-79

7. Lim K H, Datta A. Interest classification of twitter users using wikipedia. Proceedings of the 9th International Symposium on Open Collaboration (OpenSym’13), Aug 5-7, 2013, Hong Kong, China. New York, NY, USA: ACM, 2013: Article 22

8. Liu, Z Y, Chen X X, Sun M S. A simple word trigger method for social tag suggestion. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP’11), Jul 27-31, 2011, Edinburgh, Scotland, UK. 2011:1577-1588

9. Wu W, Zhang B, Ostendorf M. Automatic generation of personalized annotation tags for twitter users. Human Language Technologies: Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT’10), Jun 2-4, 2010, Los Angeles, CA, USA. New York, NY, USA: ACM, 2010: 689-692

10. Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 1988, 24(5): 513-523

11. Mihalcea R, Tarau P. Textrank: bringing order into texts. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP’04), Jul 25-26, 2004, Barcelona, Spain. 2004: 404-411

12. Liu D, Wu Q Y, Han W H. Mining microblog users’ interest features via fingerprint generation. Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering (ICCSEE’13), Mar 22-23, 2013, Hangzhou, China. Paris, France: Atlantis Press, 2013: 1469-1472

13. White R W, Bailey P, Chen L W. Predicting user interests from contextual information. Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09), Jul 19-23, 2009, Boston, MA, USA. New York, NY, USA: ACM, 2009: 363-370

[1]	Guo Xiangbo, Wang Jian, Huang Mengjie, Wang Minghui, Yang Jian, Yu Yongtao. Deep knowledge tracking algorithm based on forgetting law[J]. 中国邮电高校学报(英文版), 2023, 30(1): 17-27.
[2]	Wang Xianlun, Wang Guangyu, Cui Yuxia. Facial expression recognition based on improved ResNet[J]. 中国邮电高校学报(英文版), 2023, 30(1): 28-38.
[3]	Wu Hongxin, Lin Zhijian, Chen Pingping, Chen Feng. Joint partial computation offloading and resource allocation in MEC-enable networks[J]. 中国邮电高校学报(英文版), 2023, 30(1): 80-86.
[4]	Kong Chao, Ou Weihua, Gong Xiaofeng, Li Weian, Han Jie, Yao Yi, Xiong Jiahao. Face anti-spoofing based on multi-modal and multi-scale features fusion [J]. 中国邮电高校学报(英文版), 2022, 29(6): 73-82.
[5]	张函井音吉赵永利. FS-LSTM: sales forecasting in e-commerce on feature selection [J]. 中国邮电高校学报(英文版), 2022, 29(5): 92-98.
[6]	Shi Jinjing, Wang Wenxuan, Xiao Zimeng, Mu Shuai, Li Qin. Quantum classifier with parameterized quantum circuit based on the isolated quantum system[J]. 中国邮电高校学报(英文版), 2022, 29(4): 21-31.
[7]	Liu Hailing, Zhang Jie, Qin Sujuan, Gao Fei. Quantum algorithm for soft margin support vector machine with hinge loss function[J]. 中国邮电高校学报(英文版), 2022, 29(4): 32-41.
[8]	Wang Jian, Qiao Kuoyuan, Yuan Yanlei, Liu Xiaole, Yang Jian. Adaptive learning path recommendation model for examination-oriented education[J]. 中国邮电高校学报(英文版), 2022, 29(4): 77-88.
[9]	Meng Wei, Wang Liting, Lu Meng. Summary of research on recommendation system based on serendipity[J]. 中国邮电高校学报(英文版), 2022, 29(4): 89-105.
[10]	Jia Wei, Gong Chao. Precise and efficient Chinese license plate recognition in the real monitoring scene of intelligent transportation system[J]. 中国邮电高校学报(英文版), 2022, 29(3): 1-14.
[11]	Song Yue, Wu Chengmao, Tian Xiaoping, Song Qiuyu. Enhanced kernel-based fuzzy local information clustering integrating neighborhood membership [J]. 中国邮电高校学报(英文版), 2021, 28(6): 65-81.
[12]	Xue Chenzi, Wei Yifei, Zhang Yong. Performance optimization for smart grid blockchain integrated with fog computing using DDQN[J]. 中国邮电高校学报(英文版), 2021, 28(2): 68-78.
[13]	Guo Hairu, Meng Xueyao, Liu Yongli, Liu Shen. Improved HHO algorithm based on good point set and nonlinear convergence formula[J]. 中国邮电高校学报(英文版), 2021, 28(2): 48-67.
[14]	吴成茂曹卓. Entropy-like distance driven fuzzy clustering with local information constraints for image segmentation [J]. 中国邮电高校学报(英文版), 2021, 28(1): 24-40.
[15]	山蕊蒋林邓军勇崔朋飞张玉婷吴皓月谢晓燕. Parallel design of convolutional neural networks for remote sensing images object recognition based on data-driven array processor [J]. 中国邮电高校学报(英文版), 2020, 27(6): 87-100.

Mining microblog user interests based on TextRank with TF-IDF factor

Mining microblog user interests based on TextRank with TF-IDF factor

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价