Proceedings of
International Conference on Advances in Electronics and Electrical Engineering AEEE 2013
"PREDICTING HADOOP PARAMETERS"
Abstract: “The interest in analyzing the growing amounts of data has encouraged the deployment of large scale parallel computing frameworks such as Hadoop. In other words, data analytic is the main reason behind the success of distributed systems; this is due to the fact that data might not fit on a single disk, and that processing can be very time consuming so analyzing the input in parallel is very useful. Hadoop relies on the MapReduce programming paradigm to distribute work among the machines; so a good balance of load will eventually influence the execution time of those kinds of applications.This paper introduces a technique to predict some configuration parameters from the application\'s CPU utilization in order to optimize Hadoop.”
Keywords: Hadoop, Cloud Computing, Distributed Systems