The Journal of China Universities of Posts and Telecommunications ›› 2020, Vol. 27 ›› Issue (2): 56-64.doi: 10.19682/j.cnki.1005-8885.2020.1007

• Artificial intelligence • Previous Articles     Next Articles

Research on calculation method of text similarity based on smooth inverse frequency

Yuan Ye, Yu Minmin, Liu Jiming   

  1. Key Laboratory of E-commerce and Modern Logistics, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2019-12-30 Revised:2020-05-09 Online:2020-04-30 Published:2020-07-07
  • Contact: Yuan Ye, E-mail: yuanye@cqupt.edu.cn E-mail:yuanye@cqupt.edu.cn
  • About author:Yuan Ye, E-mail: yuanye@cqupt.edu.cn
  • Supported by:
    This work was supported by Chongqing Education Committee (20SKGH059).

Abstract: In order to improve the accuracy of text similarity calculation, this paper presents a text similarity function part of speech and word order-smooth inverse frequency (PO-SIF) based on sentence vector, which optimizes the classical SIF calculation method in two aspects: part of speech and word order. The classical SIF algorithm is to calculate sentence similarity by getting a sentence vector through weighting and reducing noise. However, the different methods of weighting or reducing noise would affect the efficiency and the accuracy of similarity calculation. In our proposed PO-SIF, the weight parameters of the SIF sentence vector are first updated by the part of speech subtraction factor, to determine the most crucial words. Furthermore, PO-SIF calculates the sentence vector similarity taking into the account of word order, which overcomes the drawback of similarity analysis that is mostly based on the word frequency. The experimental results validate the performance of our proposed PO-SIF on improving the accuracy of text similarity calculation.

Key words:

word2vec, SIF, part-of-speech, word order similarity