JOURNAL OF CHINA UNIVERSITIES OF POSTS AND TELECOM ›› 2018, Vol. 25 ›› Issue (3): 80-91.doi: 10.19682/j.cnki.1005-8885.2018.0012

• Others • Previous Articles     Next Articles

Application of a soft competition learning method in document clustering

yehang zhu 2   

  • Received:2017-08-08 Revised:2018-01-16 Online:2018-06-29 Published:2018-06-30
  • Contact: yehang zhu E-mail:zhuyehang@126.com
  • Supported by:
    The Project of Natural Science Foundation research project of Shaanxi Province of China;The humanities and social sciences research youth fund project of Ministry of Education of China

Abstract: Hard competition learning has the feature that each point modifies only one cluster centroid that wins. Correspondingly, soft competition learning has the feature that each point modifies not only the cluster centroid that wins, but also many other cluster centroids near this point. A soft competition learning method is proposed. Centroid all rank distance(CARD), CARDx, and Centroid all rank distance batch K-means(CARDBK) are three clustering algorithms that adopt the soft competition learning method proposed by us. Among them the extent to which one point affects a cluster centroid depends on the distances from this point to the other nearer cluster centroids, rather than just the rank number of the distance from this point to this cluster centroid among the distances from this point to all cluster centroids. In addition, the validation experiments are carried out in order to compare the three soft competition learning algorithms CARD, CARDx, and CARDBK with several hard competition learning algorithms as well as neural gas(NG) algorithm on five data sets from different sources. Judging from the values of five performance indexes in the clustering results, this kind of soft competition learning method has better clustering effect and efficiency, and has linear scalability.

Key words: clustering methods, text processing, document handling, competition learning method

CLC Number: