中国邮电高校学报(英文版) ›› 2018, Vol. 25 ›› Issue (3): 80-91.doi: 10.19682/j.cnki.1005-8885.2018.0012

• Others • 上一篇    下一篇

Application of a soft competition learning method in document clustering

朱烨行,张明杰   

  1. 西安邮电大学
  • 收稿日期:2017-08-08 修回日期:2018-01-16 出版日期:2018-06-29 发布日期:2018-06-30
  • 通讯作者: 朱烨行 E-mail:zhuyehang@126.com
  • 基金资助:
    陕西省自然科学基础研究计划项目;教育部人文社会科学研究青年基金项目

Application of a soft competition learning method in document clustering

yehang zhu 2   

  • Received:2017-08-08 Revised:2018-01-16 Online:2018-06-29 Published:2018-06-30
  • Contact: yehang zhu E-mail:zhuyehang@126.com
  • Supported by:
    The Project of Natural Science Foundation research project of Shaanxi Province of China;The humanities and social sciences research youth fund project of Ministry of Education of China

摘要: Hard competition learning has the feature that each point modifies only one cluster centroid that wins. Correspondingly, soft competition learning has the feature that each point modifies not only the cluster centroid that wins, but also many other cluster centroids near this point. A soft competition learning method is proposed. Centroid all rank distance(CARD), CARDx, and Centroid all rank distance batch K-means(CARDBK) are three clustering algorithms that adopt the soft competition learning method proposed by us. Among them the extent to which one point affects a cluster centroid depends on the distances from this point to the other nearer cluster centroids, rather than just the rank number of the distance from this point to this cluster centroid among the distances from this point to all cluster centroids. In addition, the validation experiments are carried out in order to compare the three soft competition learning algorithms CARD, CARDx, and CARDBK with several hard competition learning algorithms as well as neural gas(NG) algorithm on five data sets from different sources. Judging from the values of five performance indexes in the clustering results, this kind of soft competition learning method has better clustering effect and efficiency, and has linear scalability.

关键词: clustering methods, text processing, document handling, competition learning method

Abstract: Hard competition learning has the feature that each point modifies only one cluster centroid that wins. Correspondingly, soft competition learning has the feature that each point modifies not only the cluster centroid that wins, but also many other cluster centroids near this point. A soft competition learning method is proposed. Centroid all rank distance(CARD), CARDx, and Centroid all rank distance batch K-means(CARDBK) are three clustering algorithms that adopt the soft competition learning method proposed by us. Among them the extent to which one point affects a cluster centroid depends on the distances from this point to the other nearer cluster centroids, rather than just the rank number of the distance from this point to this cluster centroid among the distances from this point to all cluster centroids. In addition, the validation experiments are carried out in order to compare the three soft competition learning algorithms CARD, CARDx, and CARDBK with several hard competition learning algorithms as well as neural gas(NG) algorithm on five data sets from different sources. Judging from the values of five performance indexes in the clustering results, this kind of soft competition learning method has better clustering effect and efficiency, and has linear scalability.

Key words: clustering methods, text processing, document handling, competition learning method

中图分类号: