Acta Metallurgica Sinica(English letters) ›› 2014, Vol. 21 ›› Issue (1): 79-85.doi: 10.1016/S1005-8885(14)60272-7

• Wireless • Previous Articles     Next Articles

Data streams classification with ensemble model based on decision-feedback

  

  1. 1. Information Security Center, Beijing University of Posts and Telecommunications, Beijing 100876, China 2. National Engineering Laboratory for Disaster Backup and Recovery, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2013-04-22 Revised:2014-01-03 Online:2014-02-28 Published:2014-02-28
  • Supported by:

    This work was supported by the National Natural Science Foundation of China (61202082), the Fundamental Research Funds for the Central Universities (BUPT2012RC0218, BUPT2012RC0219).

Abstract:

The main challenges of data streams classification include infinite length, concept-drifting, arrival of novel classes and lack of labeled instances. Most existing techniques address only some of them and ignore others. So an ensemble classification model based on decision-feedback (ECM-BDF) is presented in this paper to address all these challenges. Firstly, a data stream is divided into sequential chunks and a classification model is trained from each labeled data chunk. To address the infinite length and concept-drifting problem, a fixed number of such models constitute an ensemble model E and subsequent labeled chunks are used to update E. To deal with the appearance of novel classes and limited labeled instances problem, the model incorporates a novel class detection mechanism to detect the arrival of a novel class without training E with labeled instances of that class. Meanwhile, unsupervised models are trained from unlabeled instances to provide useful constraints for E. An extended ensemble model Ex can be acquired with the constraints as feedback information, and then unlabeled instances can be classified more accurately by satisfying the maximum consensus of Ex. Experimental results demonstrate that the proposed ECM-BDF outperforms traditional techniques in classifying data streams with limited labeled data.

Key words:

ensemble classification, novel class, concept drifting, decision-feedback

CLC Number: