中国邮电高校学报(英文) ›› 2010, Vol. 17 ›› Issue (4): 116-124.doi: 10.1016/S1005-8885(09)60497-0

• Others • 上一篇    下一篇

Autonomic failure prediction based on manifold learning for large-scale distributed systems

卢旭1,王慧强2,周仁杰2,葛宝玉2   

  1. 1. 哈尔滨工程大学计算机科学与技术学院
    2.
  • 收稿日期:2009-10-20 修回日期:2010-01-23 出版日期:2010-08-30 发布日期:2010-08-31
  • 通讯作者: 卢旭 E-mail:luxu_hrbeu@yahoo.cn;luxu@hrbeu.edu.cn;
  • 基金资助:

    This work was supported by the Hi-Tech Research and Development Program of China (2007AA01Z401), the National Natural Science Foundation of China (90718003, 60973027).

Autonomic failure prediction based on manifold learning for large-scale distributed systems

LU Xu , WANG Hui-qiang, ZHOU Ren-jie, GE Bao-yu   

  1. College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
  • Received:2009-10-20 Revised:2010-01-23 Online:2010-08-30 Published:2010-08-31
  • Supported by:

    This work was supported by the Hi-Tech Research and Development Program of China (2007AA01Z401), the National Natural Science Foundation of China (90718003, 60973027).

摘要:

This article investigates autonomic failure prediction in large-scale distributed systems with nonlinear dimensionality reduction to automatically extract failure features. Most existing methods for failure prediction focus on building prediction models or heuristic rules by discovering failure patterns, but the process of feature extraction before failure patterns recognition is rarely considered due to the increasing complexity of modern distributed systems. In this work, a novel performance-centric approach to automate failure prediction is proposed based on manifold learning (ML). In addition, the ML algorithm named supervised locally linear embedding (SLLE) is applied to achieve feature extraction. To generalize the dimensionality reduction mapping, the nonlinear mapping approximation and optimization solution is also proposed. In experimental work a file transfer test bed with fault injection is developed which can gather multilevel performance metrics transparently. Based on the runtime monitoring of these metrics, the SLLE method can automatically predict more than 50 % of the central processing unit (CPU) and memory failures, and around 70 % of the network failure.

关键词:

failure prediction, manifold learning, locally linear embedding, autonomic computing

Abstract:

This article investigates autonomic failure prediction in large-scale distributed systems with nonlinear dimensionality reduction to automatically extract failure features. Most existing methods for failure prediction focus on building prediction models or heuristic rules by discovering failure patterns, but the process of feature extraction before failure patterns recognition is rarely considered due to the increasing complexity of modern distributed systems. In this work, a novel performance-centric approach to automate failure prediction is proposed based on manifold learning (ML). In addition, the ML algorithm named supervised locally linear embedding (SLLE) is applied to achieve feature extraction. To generalize the dimensionality reduction mapping, the nonlinear mapping approximation and optimization solution is also proposed. In experimental work a file transfer test bed with fault injection is developed which can gather multilevel performance metrics transparently. Based on the runtime monitoring of these metrics, the SLLE method can automatically predict more than 50 % of the central processing unit (CPU) and memory failures, and around 70 % of the network failure.

Key words:

failure prediction, manifold learning, locally linear embedding, autonomic computing