Acta Metallurgica Sinica(English letters) ›› 2013, Vol. 20 ›› Issue (6): 77-87.doi: 10.1016/S1005-8885(13)60112-0

• Networks • Previous Articles     Next Articles

MapReduce optimization algorithm based on machine learning in heterogeneous cloud environment

  

  1. 1. Beijing Key Laboratory of Network System Architecture and Convergence, Beijing University of Posts and Telecommunications, Beijing 100876, China 2. School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China 3. Technology Research Institute, Aisino Corporation, Beijing 100195, China 4. School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2013-04-15 Revised:2013-10-10 Online:2013-12-31 Published:2013-12-27
  • Contact: Wenhui Lin E-mail:linwh16@gmail.com
  • Supported by:
    This work was supported by the Important National Science & Technology Specific Projects (2012ZX03002008), the 111 Project of China (B08004), and the Fundamental Research Funds for the Central Universities (2012RC0121).

Abstract: We present an approach to optimize the MapReduce architecture, which could make heterogeneous cloud environment more stable and efficient. Fundamentally different from previous methods, our approach introduces the machine learning technique into MapReduce framework, and dynamically improve MapReduce algorithm according to the statistics result of machine learning. There are three main aspects: learning machine performance, reduce task assignment algorithm based on learning result, and speculative execution optimization mechanism. Furthermore, there are two important features in our approach. First, the MapReduce framework can obtain nodes’ performance values in the cluster through machine learning module. And machine learning module will daily calibrate nodes’ performance values to make an accurate assessment of cluster performance. Second, with the optimization of tasks assignment algorithm, we can maximize the performance of heterogeneous clusters. According to our evaluation result, the cluster performance could have 19% improvement in current heterogeneous cloud environment, and the stability of cluster has greatly enhanced.

Key words: cloud computing, MapReduce, machine learning, heterogeneity

CLC Number: