Database Systems: Design, Implemen...

12th Edition
Carlos Coronel + 1 other
Publisher: Cengage Learning
ISBN: 9781305627482
Chapter 14, Problem 12RQ
Textbook Problem

Briefly explain how HDFS and MapReduce are complementary to each other.

Program Plan Intro

Hadoop Distributed File System (HDFS):

  • Hadoop Distributed File System (HDFS) is the primary data storage systems usually used by the Hadoop applications.
  • It was developed as an infrastructure for the Apache Nutch web search engine project and now it is an Apache Hadoop subproject.
  • It includes a NameNode and DataNode architecture to implement a distributed file system that is capable of providing high-performance access to data across Hadoop clusters.


  • MapReduce is the processing layer of Hadoop.
  • It is an open source application programming interface that is designed to process large volumes of data in parallel.
  • The MapReduce framework had two main functions, Map and Reduce.
  • The function of the term Map is to take a job and divide into small units of work and the function of the term Reduce is to collect the output generated from different nodes and then integrating them into a single result set.

Explanation of Solution

Reasons why HDFS and MapReduce are complement to each other:

  • Both HDFS and MapReduce depend on massive, relatively independent, and distribution concepts.
  • MapReduce decomposes data into independent tasks and HDFS decomp...

