Review Test Submission_ Pair RDD and Dataframes Quiz –

.pdf

School

University of Texas, Dallas *

*We aren’t endorsed by this school

Course

6350

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by BarristerThunder12757

6/26/22, 10:11 PM Review Test Submission: Pair RDD and Dataframes Quiz – ... file:///C:/Users/axm210187/Downloads/Review Test Submission_ Pair RDD and Dataframes Quiz – .... html 1/6 Review Test Submission: Pair RDD and Dataframes Quiz CS 6350.0U1 - Big Data Management and Analytics - Su22 Course Homepage Review Test Submission: Pair RDD and Dataframes Quiz User Aneena Manoj Course CS 6350.0U1 - Big Data Management and Analytics - Su22 Test Pair RDD and Dataframes Quiz Started 6/26/22 11:47 AM Submitted 6/26/22 10:09 PM Due Date 6/26/22 11:59 PM Status Completed Attempt Score 100 out of 100 points Time Elapsed 10 hours, 21 minutes Results Displayed All Answers, Submitted Answers, Correct Answers Question 1 Selected Answers: Answers: Which of the following are valid operations on key-value pair RDDs? sortByKey groupByKey groupbyValue sortByKey groupByKey sortByValue Question 2 Consider the RDD defined below: data = [["john", 20], ["bill", 25], ["sarah", 30], ["mary", 18], ["sam", 32]] rdd = sc.parallelize(data) and a function as defined: def myFun(rdd): mean = rdd.values().mean() rdd1 = rdd.map(lambda x: x[1] > mean) return rdd1 What will the output of myFun(rdd).sum() My eLearning 10 out of 10 points 10 out of 10 points Aneena Manoj

6/26/22, 10:11 PM Review Test Submission: Pair RDD and Dataframes Quiz – ... file:///C:/Users/axm210187/Downloads/Review Test Submission_ Pair RDD and Dataframes Quiz – .... html 2/6 Selected Answer: Answers: 2 2 0 [ ["sarah", 30], ["sam", 32]] [False, False, True, False, True] Question 3 Selected Answer: Answers: Consider the RDD defined below: data = [["john", 20], ["bill", 25], ["sarah", 30], ["mary", 18], ["sam", 32], ['jill', 27], ['mike', 60], ['bella', 22]] rdd = sc.parallelize(data) What will be the output of: rdd.map(lambda x: (x[0][0], x[1])).mapValues(lambda x: (x, 1)).reduceByKey(lambda x, y: (x[0] + y[0], x[1] + y[1])).mapValues(lambda x: float(x[0])/float(x[1])).collect() It will first convert given RDD so that the new key becomes the first character of original key. It will then obtain average value for each key. It will return average value for each key. It will first convert given RDD so that the new key becomes the first character of original key. It will then obtain average value for each key. It will generate count of values for each key. It will remove duplicates from the list. Question 4 Selected Answer: Answers: Consider the RDD defined below: data = [["john", 20], ["bill", 25], ["sarah", 30], ["mary", 18], ["sam", 32], ['jill', 27], ['mike', 60], ['bella', 22]] rdd = sc.parallelize(data) Each element of the RDD contains name as the key and age as the value. We would like to write a function that will sort this rdd by value in descending order. Which of the following is the correct way to write such a function and apply it to the given RDD? def sortByValue(rdd): return rdd.map(lambda x: (x[1], x[0])).sortByKey(ascending = False).map(lambda x: (x[1], x[0])) sortByValue(rdd) def sortByValue(rdd): return rdd.map(lambda x: (x[1], x[0])).sortByKey(ascending = False).map(lambda x: (x[1], x[0])) sortByValue(rdd) 10 out of 10 points 10 out of 10 points

6/26/22, 10:11 PM Review Test Submission: Pair RDD and Dataframes Quiz – ... file:///C:/Users/axm210187/Downloads/Review Test Submission_ Pair RDD and Dataframes Quiz – .... html 3/6 def sortByValue(rdd): return rdd.map(lambda x: (x[1], x[0])).sortByKey(ascending = False) sortByValue(rdd) def sortByValue(rdd): return rdd.map(lambda x: (x[1], x[0])).sortByKey(ascending = False).map(lambda x: (x[1], x[0])) rdd.sortByValue() def sortByValue(rdd): return rdd.sortByKey(ascending = False).map(lambda x: (x[1], x[0])) sortByValue(rdd) Question 5 Selected Answers: Answers: Suppose I create a pair RDD as follows: majors_data = [["john", "cs"], ["bill", "cs"], ["sarah", "math"], ["mary", "stats"], ["sam", "physics"], ['jill', "math"], ['mike', "cs"], ['bella', "cs"]] majors = sc.parallelize(majors_data) In the above pair RDD, name of the person is the key and their major is the value. I would like to count how many students are enrolled in each major. Which of the following accomplishes this? majors.values().countByValue() majors.values().map(lambda x: (x, 1)).countByKey() majors.values().countByValue() majors.values().map(lambda x: (x, 1)).countByKey() majors.values().count() majors.countByValue() Question 6 Selected Answers: Suppose I create two pair-RDDs as follows: data = [["john", 20], ["bill", 25], ["sarah", 30], ["mary", 18], ["sam", 32], ['jill', 27], ['mike', 60], ['bella', 22]] rdd = sc.parallelize(data) majors_data = [["john", "cs"], ["bill", "cs"], ["sarah", "math"], ["mary", "stats"], ["sam", "physics"], ['jill', "math"], ['mike', "cs"], ['bella', "cs"]] majors = sc.parallelize(majors_data) In the first pair-RDD, name of the person is the key and age is the value, and in the second pair-RDD, name of the person is the key and their major is the value. I would like to output a pair-RDD with major as the key and value as the oldest person enrolled in that major. Which of the following accomplishes this? rdd.join(majors).values().map(lambda x: (x[1], x[0])).reduceByKey(lambda x, y: max(x, y)) 10 out of 10 points 10 out of 10 points

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version