中国邮电高校学报(英文) ›› 2009, Vol. 16 ›› Issue (6): 113-120.doi: 10.1016/S1005-8885(08)60296-4

• Others • 上一篇    下一篇

New heuristic method for data discretization
based on rough set theory

赵军,周应华   

  1. Institute of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • 收稿日期:2008-10-22 修回日期:1900-01-01 出版日期:2009-12-30
  • 通讯作者: 赵军

New heuristic method for data discretization
based on rough set theory

ZHAO Jun, ZHOU Ying-hua   

  1. Institute of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2008-10-22 Revised:1900-01-01 Online:2009-12-30
  • Contact: ZHAO Jun

摘要:

Data discretization contributes much to the induction of classification rules or trees by machine learning methods. The rough set theory is a valid tool for discretizing continuous information systems. Herein, a new method is proposed to improve those typical rough set based heuristic algorithms for data discretization, by utilizing decision information to reduce the scales of candidate cuts, and by more reasonably measuring cut significance with a new conception of cut selection probability. Simulations demonstrate that compared with other typical discretization algorithms based on the rough set theory, the proposed method is more capable and valid to discretize continuous information systems. It can effectively improve the predictive accuracies of information systems while still conceptually keeping their consistency.

关键词:

data;discretization,;rough;set;theory,;cut,;cut;significance,;selection;probability

Abstract:

Data discretization contributes much to the induction of classification rules or trees by machine learning methods. The rough set theory is a valid tool for discretizing continuous information systems. Herein, a new method is proposed to improve those typical rough set based heuristic algorithms for data discretization, by utilizing decision information to reduce the scales of candidate cuts, and by more reasonably measuring cut significance with a new conception of cut selection probability. Simulations demonstrate that compared with other typical discretization algorithms based on the rough set theory, the proposed method is more capable and valid to discretize continuous information systems. It can effectively improve the predictive accuracies of information systems while still conceptually keeping their consistency.

Key words:

data discretization;rough set theory;cut;cut significance;selection probability