Case Study : Mapreduce Programming Model

1699 Words Nov 14th, 2015 7 Pages
“MapReduce Programming model is an associated implementation for processing and generating large datasets.” Prior to the development of MapReduce, Google was facing issues for processing large amounts of raw data, such as crawled documents, PageRank, Web request logs, etc. Even though computations were conceptually straightforward, the input data was very large. Also computations had to be distributed across many machines to reduce the computation time. So there was a need to find a better solution.

Jeffrey Dean and Sanjay Ghemawat came up with the concept of MapReduce at Google in 2004. The main idea behind MapReduce is to map your data set into a collection of (key, value) pairs and then reducing over all the pairs with same key. A (key, value) pair is given as an input to the map function. Map function then produces output in the form of a set of intermediate (key, value) pair. The MapReduce library looks for all the values of one key and groups them together. These are then passed to reduce function. Reduce function then merges these values together to form a possibly smaller set of values.

MapReduce is implemented in a master/worker configuration where one master serves as the coordinator for many workers. A worker can be assigned a role of either a map worker or a reducer worker. Mainly there are seven stages in execution workflow.

STEP 1: SPLIT INPUT MapReduce library splits the input into multiple pieces. Each…
Open Document