ANguyen - Drills with R Week 1

.docx

School

University of South Florida *

*We aren’t endorsed by this school

Course

6217

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

10

Uploaded by qinhann

Report
Week 1 – Drills with R 1 Week 1 – Drills with R An Nguyen University of the Cumberlands Statistics for Data Science (MSDS-531-M30) – Full Term Dr. Ora Denton January 21 st , 2024
Week 1 – Drills with R 2 1. The student directory for a large university has 400 pages with 130 names per page, a total of 52,000 names. Using software, show how to select a simple random sample of 10 names. - I imagine that every student would have an ID # assigned to them and if we save it as data frame, most likely there would be an index column from 1 to 52000 for each student. The easiest way to pick 10 random name is to get a random sample of 10 index # from 1:52000. - R code to do so: - Result: 2. From the Murder data file, use the variable murder, which is the murder rate (per 100,000 population) for each state in the U.S. in 2017 according to the FBI Uniform Crime Reports. At first, do not use the observation for D.C. (DC). Using software: a) Find the mean and standard deviation and interpret their values. #create variable to store index # from 1 to 52000 studentDirectory <- 1:52000 #take a random sample of 10, assign to variable and print it randomSample <- sample(studentDirectory, 10) print(randomSample)
Week 1 – Drills with R 3 - R code to do so: - Results: - Mean & Standard deviation interpretation: On average, if we do not count DC then the United States have an average murder rate of 4.874. This means that the murder rates across the states are centered around 4.874. On the other hand, the standard deviation of 2.586291 indicates the dispersion of the murder rates regarding the mean. However, without further details about the distribution of the murder rates, it’s hard to say if that standard deviation #Question 2a #Assign the murder data to var murderAll murderAll <- read.table("https://stat4ds.rwth-aachen.de/data/Murder.dat", header = TRUE) #Assign murderAll var except for DC to var murderNoDC murderNoDC <- murderAll[murderAll$state != "DC", ] #Calculate mean of murder rate of all states except for DC, assign mean to var meanNoDC meanNoDC <- mean(murderNoDC$murder) # Calculate standard deviation of murder rate of all states except for DC, assign sd to var sdNoDC sdNoDC <- sd(murderNoDC$murder) # Print meanNoDC and sdNoDC print(meanNoDC) print(sdNoDC)
Week 1 – Drills with R 4 is considered high or low. But what I can is if standard deviation is low, it means that most states have murder rates that are relatively close to the mean of 4.874, while a high standard deviation would indicate otherwise. b) Find the five-number summary and construct the corresponding boxplot. - R code to do so: - Result: #Question 2b # Get the 5 number summary of murder rate by states without DC sumNoDC <- summary(murderNoDC$murder) # Print the summary print(sumNoDC) # Generate a boxplot on murder rate by states without DC boxplot(murderNoDC$murder, ylab="Murder Rate") print(sdNoDC)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help