Review Test Submission_ Pair RDD and Dataframes Quiz –
.pdf
keyboard_arrow_up
School
University of Texas, Dallas *
*We aren’t endorsed by this school
Course
6350
Subject
Computer Science
Date
Feb 20, 2024
Type
Pages
6
Uploaded by BarristerThunder12757
6/26/22, 10:11 PM
Review Test Submission: Pair RDD and Dataframes Quiz – ...
file:///C:/Users/axm210187/Downloads/Review Test Submission_ Pair RDD and Dataframes Quiz – ....
html
1/6
Review Test Submission: Pair RDD and Dataframes Quiz
CS 6350.0U1 - Big Data Management and Analytics - Su22
Course Homepage
Review Test Submission: Pair RDD and Dataframes Quiz
User
Aneena Manoj
Course
CS 6350.0U1 - Big Data Management and Analytics - Su22
Test
Pair RDD and Dataframes Quiz
Started
6/26/22 11:47 AM
Submitted
6/26/22 10:09 PM
Due Date
6/26/22 11:59 PM
Status
Completed
Attempt Score
100 out of 100 points Time Elapsed
10 hours, 21 minutes
Results Displayed
All Answers, Submitted Answers, Correct Answers
Question 1
Selected Answers:
Answers:
Which of the following are valid operations on key-value pair RDDs?
sortByKey
groupByKey
groupbyValue
sortByKey
groupByKey
sortByValue
Question 2
Consider the RDD defined below: data = [["john", 20], ["bill", 25], ["sarah", 30], ["mary", 18], ["sam", 32]] rdd = sc.parallelize(data)
and a function as defined: def myFun(rdd): mean = rdd.values().mean() rdd1 = rdd.map(lambda x: x[1] > mean) return rdd1
What will the output of myFun(rdd).sum()
My eLearning
10 out of 10 points
10 out of 10 points
Aneena Manoj
6/26/22, 10:11 PM
Review Test Submission: Pair RDD and Dataframes Quiz – ...
file:///C:/Users/axm210187/Downloads/Review Test Submission_ Pair RDD and Dataframes Quiz – ....
html
2/6
Selected Answer:
Answers:
2
2
0
[ ["sarah", 30], ["sam", 32]]
[False, False, True, False, True]
Question 3
Selected
Answer:
Answers:
Consider the RDD defined below: data = [["john", 20], ["bill", 25], ["sarah", 30], ["mary", 18], ["sam", 32], ['jill', 27], ['mike',
60], ['bella', 22]] rdd = sc.parallelize(data)
What will be the output of: rdd.map(lambda x: (x[0][0], x[1])).mapValues(lambda x: (x, 1)).reduceByKey(lambda x,
y: (x[0] + y[0], x[1] + y[1])).mapValues(lambda x: float(x[0])/float(x[1])).collect()
It will first convert given RDD so that the new key becomes the first
character of original key. It will then obtain average value for each key.
It will return average value for each key.
It will first convert given RDD so that the new key becomes the first
character of original key. It will then obtain average value for each key.
It will generate count of values for each key.
It will remove duplicates from the list.
Question 4
Selected
Answer:
Answers:
Consider the RDD defined below: data = [["john", 20], ["bill", 25], ["sarah", 30], ["mary", 18], ["sam", 32], ['jill', 27], ['mike',
60], ['bella', 22]] rdd = sc.parallelize(data)
Each element of the RDD contains name as the key and age as the value. We would
like to write a function that will sort this rdd by value in descending order. Which of the
following is the correct way to write such a function and apply it to the given RDD?
def sortByValue(rdd): return rdd.map(lambda x: (x[1], x[0])).sortByKey(ascending =
False).map(lambda x: (x[1], x[0]))
sortByValue(rdd)
def sortByValue(rdd): return rdd.map(lambda x: (x[1], x[0])).sortByKey(ascending =
False).map(lambda x: (x[1], x[0]))
sortByValue(rdd)
10 out of 10 points
10 out of 10 points
6/26/22, 10:11 PM
Review Test Submission: Pair RDD and Dataframes Quiz – ...
file:///C:/Users/axm210187/Downloads/Review Test Submission_ Pair RDD and Dataframes Quiz – ....
html
3/6
def sortByValue(rdd): return rdd.map(lambda x: (x[1], x[0])).sortByKey(ascending = False)
sortByValue(rdd)
def sortByValue(rdd): return rdd.map(lambda x: (x[1], x[0])).sortByKey(ascending =
False).map(lambda x: (x[1], x[0]))
rdd.sortByValue()
def sortByValue(rdd): return rdd.sortByKey(ascending = False).map(lambda x: (x[1], x[0]))
sortByValue(rdd)
Question 5
Selected Answers:
Answers:
Suppose I create a pair RDD as follows: majors_data = [["john", "cs"], ["bill", "cs"], ["sarah", "math"], ["mary", "stats"], ["sam",
"physics"], ['jill', "math"], ['mike', "cs"], ['bella', "cs"]] majors = sc.parallelize(majors_data)
In the above pair RDD, name of the person is the key and their major is the value. I
would like to count how many students are enrolled in each major. Which of the
following accomplishes this?
majors.values().countByValue()
majors.values().map(lambda x: (x, 1)).countByKey()
majors.values().countByValue()
majors.values().map(lambda x: (x, 1)).countByKey()
majors.values().count()
majors.countByValue()
Question 6
Selected
Answers:
Suppose I create two pair-RDDs as follows: data = [["john", 20], ["bill", 25], ["sarah", 30], ["mary", 18], ["sam", 32], ['jill', 27], ['mike',
60], ['bella', 22]] rdd = sc.parallelize(data) majors_data = [["john", "cs"], ["bill", "cs"], ["sarah", "math"], ["mary", "stats"], ["sam",
"physics"], ['jill', "math"], ['mike', "cs"], ['bella', "cs"]] majors = sc.parallelize(majors_data)
In the first pair-RDD, name of the person is the key and age is the value, and in the
second pair-RDD, name of the person is the key and their major is the value. I would
like to output a pair-RDD with major as the key and value as the oldest person enrolled
in that major. Which of the following accomplishes this?
rdd.join(majors).values().map(lambda x: (x[1],
x[0])).reduceByKey(lambda x, y: max(x, y))
10 out of 10 points
10 out of 10 points
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
explain the erd diagram. ( dont use ai tool)
arrow_forward
Describe the link between ODBC, OLE DB, and ADO.
arrow_forward
Look at the geospatial visualizations in the CDC COVID Data Tracker website. What types of geospatial visualizations (in light of the types introduced in the Week 4 Lecture) do you see there? Do you think if these are right types for the visualized content? There are several choropleth maps there. Can these maps be replaced by heatmaps? What are the advantages and disadvantages if a choropleth map is used versus a heatmap?
arrow_forward
Discuss the challenges of data sovereignty and compliance in a multi-cloud environment.
arrow_forward
Show transcribed data
MSU Corporation Database Department Mgr Start Date 2005-06-19 2013-01-01 1998-05-22 t No Name Headquarters Admin and Records 999555555 Production Mgr SSN 999666666 999444444 Employee EMP SSN FName Nme Address City State Zip DOB Salary Park Gender Dep |Super SSN t No ing Space 99Bock Dougl 1955 09-01 30000 542M St. Louis MO 63121 7 999444444 Dr 1969- 03-29 999222222Amin HyderSeaside Marina CA93941 Collinsville IL 66234 Waiman Lindber St. Louis MO 63121 3999555555 7 999444444 7 999666666 25000 422 M Apt. B #10 Oak St. 303 999333333 Joshi Dinesh 38000 332 M 09-15 1975 12-08 43000 32 999444444 Zhu 202 999555555 Joyner 1971 06-20 430003 CA 93941 Suzanne Burns Marina arm
Chapter 8 Homework Answer the questions below. In doing this homework, you don’t have to use a DBMS. If you didn’t create MSU Corporation datanase to do Chapter 7 homework, you can use the table information in “MSU Corporation Database” document attached. If you want to create the database now, use…
arrow_forward
Discuss the challenges and best practices for data migration to and from the cloud.
arrow_forward
Document1 - Compatibility Mode - Word
O Search
Design
Layout
References
Mailings
Review
View
MathType
Help
EndNote 20
In your own words discuss what is the best FC topology to be
used in a large-scale datacenter and why you did not use other
topologies? Explain
arrow_forward
Based on the Gethub Hadoop Ecosystem table (http://hadoopecosystemtable.github.io/) categorize the following elements of the Hadoop Ecosystem:
Facebook Presto
Ceph Filesystem
Facebook Corona
JAQL
BayesDB
Apache Ignite
Apache Accumulo
|
SQL-On-Hadoop
Distributed Programming
NewSQL Databases
Data Ingestion
Distributed Filesystem
NoSQL Databases
Machine Learning
arrow_forward
Describe how HDFS and MapReduce work together.
arrow_forward
DO NOT COPY FROM OTHER WEBSITES
Write your own answer. Thank you!!!
arrow_forward
Discuss the challenges and considerations of data warehousing in a hybrid cloud environment, including data security and compliance.
arrow_forward
What distinguishes a DataReader from a DataSet, and how does that difference present itself in practice?
arrow_forward
Urban data design is a company which designs digital products and concepts that make neighborhoods more sustainable and healthier place to live and do business in. This project focuses on a creation of a new app for Urban Data Design for one of their business problems. The app will make it easier for organizers to find the best suited open spaces according to their requirements for a potential event. Make it easier for everyday citizens to find and participate in interesting events happening in open spaces around them.
The purpose is to provide a platform for people in the community to get information regarding open spaces located in their area.
The people will be able to organize events in these open spaces and list them on the app.
People can get information regarding events which are happening in their areas in open spaces.
Provide the community to participate in communal activities to build strong ties and belonging to its community.
These events can promote physical activities…
arrow_forward
Question 29
The
ఒEaLe o pssn aeus a3s/og ఇంec ఇu ఇం ఇం20n అB S)
(A) Report
B) Query
Form
D) Datasheet view
arrow_forward
Explore the challenges and solutions associated with data sovereignty and compliance in a multi-cloud environment.
arrow_forward
Describe the three data source categories?
arrow_forward
Discuss the challenges and solutions related to data sovereignty and compliance in a multi-cloud environment.
arrow_forward
What exactly is meant by the phrase "cluster"? What are the benefits and drawbacks of data mining?
arrow_forward
What really differentiates a DataReader from a DataSet, as well as the intricacies of how this difference expresses itself in practice?
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Related Questions
- explain the erd diagram. ( dont use ai tool)arrow_forwardDescribe the link between ODBC, OLE DB, and ADO.arrow_forwardLook at the geospatial visualizations in the CDC COVID Data Tracker website. What types of geospatial visualizations (in light of the types introduced in the Week 4 Lecture) do you see there? Do you think if these are right types for the visualized content? There are several choropleth maps there. Can these maps be replaced by heatmaps? What are the advantages and disadvantages if a choropleth map is used versus a heatmap?arrow_forward
- Discuss the challenges of data sovereignty and compliance in a multi-cloud environment.arrow_forwardShow transcribed data MSU Corporation Database Department Mgr Start Date 2005-06-19 2013-01-01 1998-05-22 t No Name Headquarters Admin and Records 999555555 Production Mgr SSN 999666666 999444444 Employee EMP SSN FName Nme Address City State Zip DOB Salary Park Gender Dep |Super SSN t No ing Space 99Bock Dougl 1955 09-01 30000 542M St. Louis MO 63121 7 999444444 Dr 1969- 03-29 999222222Amin HyderSeaside Marina CA93941 Collinsville IL 66234 Waiman Lindber St. Louis MO 63121 3999555555 7 999444444 7 999666666 25000 422 M Apt. B #10 Oak St. 303 999333333 Joshi Dinesh 38000 332 M 09-15 1975 12-08 43000 32 999444444 Zhu 202 999555555 Joyner 1971 06-20 430003 CA 93941 Suzanne Burns Marina arm Chapter 8 Homework Answer the questions below. In doing this homework, you don’t have to use a DBMS. If you didn’t create MSU Corporation datanase to do Chapter 7 homework, you can use the table information in “MSU Corporation Database” document attached. If you want to create the database now, use…arrow_forwardDiscuss the challenges and best practices for data migration to and from the cloud.arrow_forward
- Document1 - Compatibility Mode - Word O Search Design Layout References Mailings Review View MathType Help EndNote 20 In your own words discuss what is the best FC topology to be used in a large-scale datacenter and why you did not use other topologies? Explainarrow_forwardBased on the Gethub Hadoop Ecosystem table (http://hadoopecosystemtable.github.io/) categorize the following elements of the Hadoop Ecosystem: Facebook Presto Ceph Filesystem Facebook Corona JAQL BayesDB Apache Ignite Apache Accumulo | SQL-On-Hadoop Distributed Programming NewSQL Databases Data Ingestion Distributed Filesystem NoSQL Databases Machine Learningarrow_forwardDescribe how HDFS and MapReduce work together.arrow_forward
- DO NOT COPY FROM OTHER WEBSITES Write your own answer. Thank you!!!arrow_forwardDiscuss the challenges and considerations of data warehousing in a hybrid cloud environment, including data security and compliance.arrow_forwardWhat distinguishes a DataReader from a DataSet, and how does that difference present itself in practice?arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Principles of Information Systems (MindTap Course...Computer ScienceISBN:9781305971776Author:Ralph Stair, George ReynoldsPublisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning