Multi-label text classification model based on semantic embedding

doi:10.19682/j.cnki.1005-8885.2019.0001

中国邮电高校学报(英文) ›› 2019, Vol. 26 ›› Issue (1): 95-104.doi: 10.19682/j.cnki.1005-8885.2019.0001

• Others • 上一篇

Multi-label text classification model based on semantic embedding

闫丹凤¹,柯楠²,顾超³

1. 北京邮电大学
2. 北京邮电大学网络与交换技术国家重点实验室
3. 山东电力公司

收稿日期:2018-02-05 修回日期:2018-12-25 出版日期:2019-02-26 发布日期:2019-02-27
通讯作者: 闫丹凤 E-mail:yandf@bupt.edu.cn
基金资助:
国家863项目;国家电网科技项目;国家自然科学基金

Multi-label text classification model based on semantic embedding

Received:2018-02-05 Revised:2018-12-25 Online:2019-02-26 Published:2019-02-27
Contact: YAN Dan-Feng E-mail:yandf@bupt.edu.cn
Supported by:
National 863 project;State Grid science and technology project

摘要/Abstract

摘要： Text classification means to assign a document to one or more classes or categories according to content. Text classification provides convenience for users to obtain data. Because of the polysemy of text data, multi-label classification can handle text data more comprehensively. Multi-label text classification become the key problem in the data mining. To improve the performances of multi-label text classification, semantic analysis is embedded into the classification model to complete label correlation analysis, and the structure, objective function and optimization strategy of this model is designed. Then, the convolution neural network (CNN) model based on semantic embedding is introduced. In the end, Zhihu dataset is used for evaluation. The result shows that this model outperforms the related work in terms of recall and area under curve (AUC) metrics.

关键词: multi-label, text classification, convolution neural network, semantic analysis

Abstract: Text classification means to assign a document to one or more classes or categories according to content. Text classification provides convenience for users to obtain data. Because of the polysemy of text data, multi-label classification can handle text data more comprehensively. Multi-label text classification become the key problem in the data mining. To improve the performances of multi-label text classification, semantic analysis is embedded into the classification model to complete label correlation analysis, and the structure, objective function and optimization strategy of this model is designed. Then, the convolution neural network (CNN) model based on semantic embedding is introduced. In the end, Zhihu dataset is used for evaluation. The result shows that this model outperforms the related work in terms of recall and area under curve (AUC) metrics.

Key words: multi-label, text classification, convolution neural network, semantic analysis

参考文献

1. Chen Z J. Graphical Structure Description of Multi-label Classification Problems and Research of Some Learning Algorithms. Guangzhou, China: South China University of Technology,2015 (in Chinese)

2. Li X M. Research on Multi-Label Text Classification and Stream Text Data Modeling Based on Topic Model. Jilin, China: Jilin University, 2015 (in Chinese)

3. Han K K, Kim H, Cho S. Bag-of-concepts: Comprehending document representation through clustering words in distributed representation. Neurocomputing, 2017, 266(29): 336-352

4. Zhao R, Mao K. Fuzzy Bag-of-Words Model for Document Representation. IEEE Transactions on Fuzzy Systems, 2018, 26(2):794-804

5. Yan Yan. Research on text representation and classification based on depth learning. Beijing, China: University of Science & Technology Beijing, 2016 (in Chinese)

6. Mikolov T，Sutskever I，Chen Kai，et al． Distributed representations of words and phrases and their compositionality[EB /OL]. ( 2013-10- 16). http://arxiv.org /pdf /1310

7. PLuaces O, Díez J, Barranquero J, et al. Binary relevance efficacy for multilabel classification. Progress in Artificial Intelligence, 2012, 1(4):303-313

8. Fürnkranz J, Hüllermeier E, Mencía E L, et al. Multilabel classification via calibrated label ranking. Machine Learning, 2008, 73(2):133-153

9. Zhang M L, Zhou Z H. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 2007, 40(7):2038-2048

10. Zeng Y, Hao-Ming F U, Zhang Y P, et al. An Improved ML-KNN Algorithm by Fusing Nearest Neighbor Classification. Artificial Intelligence and Computer Science, 2016

11. Hao X L. Text classification technology and application research. Shanghai, China: Fudan University, 2008 (in Chinese)

12. Li Z G, Zhong J, Feng Y, et al. Research on text classification based on knowledge ontology and its application. Computer Science, 2007, 34(08):184-186 (in Chinese)

13. Moldagulova A, Sulaiman R B. Using KNN algorithm for classification of textual documents. 2017 8th International Conference on Information Technology (ICIT), May 2017, Amman, Jordan, 2017:665-671

14. Cui J M, Liu J M, Liao Z Y. Research on text classification based on SVM algorithm. Computer Simulation, 2013, 30(2):299-302, 368 (in Chinese)

15. Haddoud M, Mokhtari A, Lecroq T, et al. Combining supervised term-weighting metrics for SVM text classification with extended term representation. Knowledge & Information Systems, 2016, 49(3): 909-931

16. Ying Y B, Yang W Z, Yang H T, et al. Research on short text classification algorithm based on convolution neural network and KNN[J/OL]. Computer Engineering. (2017-08-24):1-6 http://kns.cnki.net/kcms/ detail/31.1289.TP.20170824.1123.004.html (in Chinese)

17. Schapire RE, Singer Y. Boostexter: A boosting-based system for text categorization. Machine Learning, 2000, 39(2):135-168

18. Chen Q X, Yao L X, Yang J. Short text classification based on LDA topic model. 2016 International Conference on Audio, Language and Image Processing (ICALIP), July 11-12 2016, Shanghai, China, 2016:749-753

19. Xu G, Wang H F. Development of Topic Model in Natural Language Processing. Chinese Journal of Computers, 2011, 34(08):1423-1436 (in Chinese)

20. Cao J, Zhang Y D, Li J T, et al. An Adaptive Optimal LDA Model Selection Method Based on Density. Chinese Journal of Computers, 2008, 31(10):1780-1787 (in Chinese)

21. Qayyum A, Anwar S M, Awais M, et al. Medical image retrieval using deep convolutional neural network. Neurocomputing, 2017, 266(29):8-20

Multi-label text classification model based on semantic embedding

Multi-label text classification model based on semantic embedding

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics

本文评价