
Database System Concepts
7th Edition
ISBN: 9780078022159
Author: Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher: McGraw-Hill Education
expand_more
expand_more
format_list_bulleted
Question
For the word count example, for the input of the Map function, keys are document
IDs and values are document contents. For the output of the Map function, keys are
words and values are counts of words (e.g., (a, 1)). After shuffling via a hashing
function on keys of the output, we combine those values with the same key into a
list, for example, (a, {1, 5}), which are used as the input of the Reduce function.
Within the reduce function, it will count (sum up) the numbers in the value list of a
key, and return the key/value pair (e.g., (a, 6)). How to Implement the WordCount example on Hadoop?
Expert Solution

This question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
This is a popular solution
Trending nowThis is a popular solution!
Step by stepSolved in 2 steps

Knowledge Booster
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.Similar questions
- Find any two practical applications of hashing, thoroughly investigate them, and then provide your own in-depth description. a. Include references from your own research as well?arrow_forwardWe have a poorly designed hash map in which all values have been placed in the same bucket (that is, they are all in the same LinkedList). Explain how this goes against the purpose of utilizing a hash map in the first place.arrow_forwardI am trying to implement a hash table of key-value pairs, where the key determines the location of the pair in the hash table, the key is what the hash function is used on. The hash table that I am trying to implement will use separate chaining (with unordered linked lists) for collision resolution. I need help changing the code where: 1. The "search" method of the code returns the node that was found or "None" 2. The "Node" implementation has a "key" and "value" member This is what I have so far class Node: def __init__(self,initdata): self.data = initdata self.next = None def getData(self): return self.data def getNext(self): return self.next def setData(self,newdata): self.data = newdata def setNext(self,newnext): self.next = newnext class UnorderedList: def __init__(self): self.head = None def isEmpty(self): return self.head == None def add(self,item): temp = Node(item) temp.setNext(self.head)…arrow_forward
- Assume that linear probing is used for hash-tables. To improve the time complexity of the operations performed on the table, a special AVAILABLE object is used to mark a location when an item is removed from the location. Assuming that all keys are positive integers, the following two techniques were suggested instead of marking the location as AVAILABLE: i) When an entry is removed, instead of marking its location in the table as AVAILABLE, indicate the key in the location as the negative value of the removed key (e.g., if the removed key was 16, indicate the key as -16). Searching for an entry with the removed key would then terminate once a negative value of the key is found (instead of continuing to search if AVAILABLE is used). ii) Instead of using AVAILABLE, find a key in the table that should have been placed in the location of the removed entry, then place that key (the entire entry of course) in that location (instead of setting the location as AVAILABLE). The motive is to…arrow_forwardFind any two practical applications of hashing, thoroughly investigate them, and then provide your own in-depth description. a. Include references from your own research as well?arrow_forwardGiven two arrays of integers, write a function to find the intersection of the arrays. The intersection should include only distinct elements and the result should be in sorted order. Solve this problem using the hash set approach.arrow_forward
- In a linked list, develop a procedure to remove duplicate keys.arrow_forward). This problem is about the chaining method we discussed in the class. Problem 3 (, Consider a hash table of size N = 11. Suppose that you insert the following sequence of keys to an initially empty hash table. Show, step by step, the content of the hash table. Sequence of keys to be inserted:arrow_forward- In class HashTable implement a hash table and consider the following:(i) Keys are integers (therefore also negative!) and should be stored in the tableint[] data.(ii) As a hash function take h(x) = (x · 701) mod 2000. The size of the table istherefore 2000. Be careful when computing the index of a negative key. Forexample, the index of the key x = −10 ish(−10) = (−7010) mod 2000 = (2000(−4) + 990) mod 2000 = 990.Hence, indices should be non-negative integers between 0 and 1999!(iii) Implement insert, which takes an integer and inserts it into a table. Themethod returns true, if the insertion is successful. If an element is already inthe table, the function insert should return false.(iv) Implement search, which takes an integer and finds it in the table. The methodreturns true, if the search is successful and false otherwise.(v) Implement delete, which takes an integer and deletes it form the table. Themethod returns true, if the deletion is successful and false otherwise.(vi)…arrow_forward
- The map-reduce framework is quite useful for creating inverted indices on a setof documents. An inverted index stores for each word a list of all documentIDs that it appears in (offsets in the documents are also normally stored, butwe shall ignore them in this question).For example, if the input document IDs and contents are as follows:1: data clean2: data base3: clean basethen the inverted lists woulddata: 1, 2clean: 1, 3base: 2, 3Give pseudocode for map and reduce functions to create inverted indices on agiven set of files (each file is a document).Assume the document IDis availableusing a function context.getDocumentID(), and the map function is invokedonce per line of the document. The output inverted list for each word should bea list of document IDs separated by commas. The document IDs are normallysorted, but for the purpose of this question you do not need to bother to sortthem.arrow_forwardSuppose you have a hash table with seven entries (indexed 0 through 6). This table uses open addressing with the hash function that maps each letter to its alphabet code (a = A = 0, etc.) modulo 7. Rehashing is accomplished using linear-probing with a jump of 1. Describe the state of the table after each of the letters D, a, d, H, a, and h are added to the table.arrow_forwardIn this case, we have a hash map that was not constructed with a lot of care since all of the values have been lumped together (that is, they are all in the same LinkedList). In your own words, please elaborate on why this undermines the use of a hash map.arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education

Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education

Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON

Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON

C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON

Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning

Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education