preview

Introduction Of Mapreduce Programming Model

Good Essays

Introduction to MapReduce Programming Model “MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster” [1]. Before the introduction of MapReduce, biggest challenge in front of Google was processing large input data and distributed computations. There was no solution for parallelizing the computations, distributing the data, and handling failures. In order to solve these problems, Google introduced the concept of MapReduce for parallel computation in a distributed system for synthesizing large datasets. MapReduce programming paradigm is similar to the concept of map and reduce primitives which are present in Lisp and many other functional languages. The input to a MapReduce program is a (key, value) pair. The output is a set of (key, value) pairs. There are two main functions, a map function and a reduce function. Map function takes an input (key, value) pair and produces a set of intermediate (key, value) pairs. The MapReduce library looks for all the values of one key and groups them together. These are then passed to reduce function. The reduce function gets an intermediate key and all the values associated with it and merges these values together to form a possibly smaller set of values. The original implementation of MapReduce was done by Google but now other companies are using their own implementation of MapReduce. For example, Amazon’s “Amazon Elastic

Get Access