Author(s): S. Vigneshwari; B. Sunitha Devi
Currently most of the cloud applications method great deal of knowledge to supply the required results. Information volumes to be processed by cloud applications are growing a lot of quicker than computing power. These growth difficulties on new approaches for process and analyzing the knowledge. The project explores the employment of Hadoop Map Reduce framework to execute scientific workflows within the cloud. Cloud computing provides monumental clusters for economical giant division and information analysis. In such file systems, a file is divided into variety of file chunks allotted in distinct nodes so Map Reduce tasks will perform in parallel over the nodes to form resource utilization effective and to enhance the interval of the task. In giant failure prone cloud environments files and nodes are dynamically created, replaced and extra within the system owing to that a number of the nodes are over loaded whereas some others are underneath loaded. It results in load imbalance in distributed filing system. to beat this load imbalance drawback, a completely distributed Load rebalancing algorithmic program has been applied, that is active in nature doesn't contemplate the previous state or behaviour of the system and it solely depends on this behaviour of the system and approximation of load, comparison of load, stability of various system, performance of system, interaction n between the nodes, nature of load to be transferred, choice of nodes and network traffic. The present Hadoop implementation assumes that computing nodes in an exceedingly cluster are homogenized in nature. The performance of Hadoop in heterogeneous clusters wherever the nodes have completely different computing capability is additionally tested.