Data 118 Quiz 3

docx

School

Shepherd University *

*We aren’t endorsed by this school

Course

118

Subject

Computer Science

Date

Dec 6, 2023

Type

docx

Pages

4

Uploaded by BarristerAtom6810

Report
Name: _______________________ Data 118: Take-Home Quiz 3 (Spring 2023) – 25 points Instructions: 1. All final work must be placed in the spaces provided. 2. Be thorough in your presentation. Do not skip steps. 3. Presentations must be neatly arranged. 4. Any sources you consult, or any individuals with whom you speak about a given problem, must be cited (see the last page). You are expected to sign the academic honesty statement below attesting your integrity. 5. This assignment is due before noon on Wednesday, November 8, 2023. No late submissions will be accepted. 6. You must submit a PDF copy. Submit online using the link on our Brightspace page. Failing to submit a PDF copy results in a ten-percentage-point penalty . Academic Honesty Statement: Other than the sources I have cited, I have not received assistance from anyone, nor have I benefited from any online or print reference. Signature: ______________________
Problems: You will be working with the data set data118_worksheet6_pitchers_comma.csv Save this file in your working directory if it is not already there, and assign it to pitchers in R. You will need to refer to the material in the clustering worksheet to do the following problems. 1. [5 points] Use pitchers to create a data frame called pitchers_scale having only the attributes ERA and K.to.BB.Ratio. Then, normalize the data in pitchers_scale , as you will be working with the normalized data. Provide your R code and the normalized pitchers_scale data frame below.
2. [10 points] Use the K-means algorithm for your cluster analysis of the two above-mentioned attributes. Specifically, invoke the kmeans command to first create three clusters, then five, using the syntax provided in the clustering worksheet. Call the outputs k3_pitchers and k5_pitchers , respectively. Below, provide pertinent R code, as well as plots of the point partitions into three and five clusters. Which partitioning (three-cluster or five-cluster) do you think allows for better analysis of the data, and why?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3. [10 points] Now perform hierarchical clustering on the ERA and K.to.BB.Ratio attributes. Use the hclust command, and assign your output to hc_pitchers . Then, plot the appropriate dendrogram, and highlight the clusters you see using the rect.hclust command. Provide pertinent R code and the highlighted dendrogram below. Would you prefer using this cluster analysis to the one you chose as the better of the two partitions in Problem 2? Why or why not? Citations for Problem 1: _________________________________________________________________________________ Citations for Problem 2: _________________________________________________________________________________ Citations for Problem 3: _________________________________________________________________________________