中国邮电高校学报(英文) ›› 2019, Vol. 26 ›› Issue (1): 95-104.doi: 10.19682/j.cnki.1005-8885.2019.0001

• Others • 上一篇    

Multi-label text classification model based on semantic embedding

闫丹凤1,柯楠2,顾超3   

  1. 1. 北京邮电大学
    2. 北京邮电大学网络与交换技术国家重点实验室
    3. 山东电力公司
  • 收稿日期:2018-02-05 修回日期:2018-12-25 出版日期:2019-02-26 发布日期:2019-02-27
  • 通讯作者: 闫丹凤 E-mail:yandf@bupt.edu.cn
  • 基金资助:
    国家863项目;国家电网科技项目;国家自然科学基金

Multi-label text classification model based on semantic embedding

  • Received:2018-02-05 Revised:2018-12-25 Online:2019-02-26 Published:2019-02-27
  • Contact: YAN Dan-Feng E-mail:yandf@bupt.edu.cn
  • Supported by:
    National 863 project;State Grid science and technology project

摘要: Text classification means to assign a document to one or more classes or categories according to content. Text classification provides convenience for users to obtain data. Because of the polysemy of text data, multi-label classification can handle text data more comprehensively. Multi-label text classification become the key problem in the data mining. To improve the performances of multi-label text classification, semantic analysis is embedded into the classification model to complete label correlation analysis, and the structure, objective function and optimization strategy of this model is designed. Then, the convolution neural network (CNN) model based on semantic embedding is introduced. In the end, Zhihu dataset is used for evaluation. The result shows that this model outperforms the related work in terms of recall and area under curve (AUC) metrics.

关键词: multi-label, text classification, convolution neural network, semantic analysis

Abstract: Text classification means to assign a document to one or more classes or categories according to content. Text classification provides convenience for users to obtain data. Because of the polysemy of text data, multi-label classification can handle text data more comprehensively. Multi-label text classification become the key problem in the data mining. To improve the performances of multi-label text classification, semantic analysis is embedded into the classification model to complete label correlation analysis, and the structure, objective function and optimization strategy of this model is designed. Then, the convolution neural network (CNN) model based on semantic embedding is introduced. In the end, Zhihu dataset is used for evaluation. The result shows that this model outperforms the related work in terms of recall and area under curve (AUC) metrics.

Key words: multi-label, text classification, convolution neural network, semantic analysis