Lab+3+Linear+Regression+and+Sampling

.pdf

School

University of California, Los Angeles *

*We aren’t endorsed by this school

Course

10

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

5

Uploaded by DukeMetalSkunk43

Report
1 Lab 2 Worksheet and Assignment Linear Regression, Probability, and Sampling Individual Assignment: Lab3 Report is Due Thursday February 08 th , 2024 @ Noon Objectives: In this lab we are: Creating scatter plots to analyze trends. Visually quantifying correlation coefficients Understand linear regression in R and verify linear regression assumptions Use R for sampling, simulations, and demonstration of the Law of Large Numbers (LLN). Collaboration Policy In Lab you are encouraged to work in pairs or small groups to discuss the concepts on the assignments. However, DO NOT copy each other’s work as this constitutes cheating. The work you submit must be entirely your own. If you have a question in lab, feel free to reach out to other groups or talk to your TA if you get stuck. Batting Data: In this section we are going to consider Batting.csv data file posted on BruinLearn Week 4. The data has 30 observations and 25 variables. The following table summarizes the variables in the data: > names(Batting) [1] "Team" "League" [3] "runs" "at_bats" [5] "hits" "doubles" [7] "triples" "homeruns" [9] "walks" "strikeouts" [11] "stolenbages" "caught_stealing" [13] "bat_avg" "intentional_walk" [15] "hitbypitch" "sacrificebunts" [17] "sacrificefly" "totalbases" [19] "extrabasehit" "groundintodoubleplay"
2 [21] "groundout" "flyout" [23] "groundoutsflyoutratio" "numberofpitches" [25] "plateappearances" Use the library: library(readr) and the function read_csv to upload the data on your R- studio session call it Batting. run: dim(Batting) to confirm the size of the data. Background: Major League Baseball teams often use statistics to forecast future success. One variable that is important to a team’s success is total runs scored during a season. If teams can determine what variables best help them manufacture runs, they can focus on improving those parts of their offense. In your R data, batting statistics for all 30 Major League Baseball teams are included for the 2011 season. In addition to runs scored, there are seven commonly-used variables in the Batting.csv file: at-bats, hits, homeruns, batting average, strikeouts, walks and stolen bases and ten additional variables (e.g., intentional walks, sacrifice bunts, plate appearances). Using various tools in R-Studio, we will examine the linear relationship between runs scored in a season and each of these other quantitative variables in the data. Our aim will be to summarize these relationships both graphically and numerically in order to find which variable, if any, bests helps us predict a team s runs scored in a season. In hopes of finding even better predictors of offensive success, baseball researchers have also introduced a few newer variables that combine multiple variables from the Batting11 collection as well as others into one number. Three of these newer variables are called on-base percentage, slugging percentage and on-base plus slugging percentage, which are included along with runs scored in the Batting data. A popular book called Money ball by Michael Lewis focuses on the “quest for the secret of success in baseball” and follows a low - budget team, the Oakland Athletics, in the early 2000’s, who believed that underused statistics, such as a player s ability to get on base, actually better predicts the ability to score runs than typical statistics like homeruns, RBIs RBI: A run batted in or runs batted in (RBI) is a statistic in baseball and softball that credits a batter for making a play that allows a run to be scored (except in certain situations such as when an error is made on the play). and batting average. In fact, obtaining players who excelled in these underused statistics turned out to be much more affordable for the team. After examining the relationships between runs scored and the more traditional baseball statistics in the Batting data, we will perform similar analyses using the newer variables in the Batting data to see if the researchers were successful in finding better predictors of team success.
3 Investigative Question: Which batting statistics can best help us predict the number of runs a team scores in a season? Our target variable for this lab is the variable runs . Our candidate predictors are: the following 15 quantitative variables: [1] "at_bats" "hits" "doubles" [4] "triples" "homeruns" "walks" [7] "strikeouts" "stolenbages" "caught_stealing" [10] "bat_avg" "intentional_walk" "hitbypitch" [13] "sacrificebunts" "sacrificefly" "totalbases" Part A: Examining Predictors One at a time: Question 1: A. Does the number of at-bats predict the number of runs a team will score? B. Create a ggplot scatter plot that shows the relationship between runs and at bats. C. What does the Graph say about the ability to predict the number of runs based on knowing a teams at bats? Question 2: A) Report the correlation coefficient for each of the predictors with the response variable. B) Based on the answer in part A. Which of the 15 predictors is the worst in predicting the number of runs? C) Based on the answer in part A. Which of the 15 predictors is the best in predicting the number of runs? D) Based on the answer in part A. Which of the 15 predictors is/are positively correlated to the number of runs? E) Based on the answer in part A. Which of the 15 predictors is/are negatively correlated to the number of runs? Question 3: Consider the best predictor and your response variable runs A) Create a scatter plot of the best predictor and the response variable runs . B) Add a smooth curve to your scatter plot in part A. C) Add the best line to your scatter plot in part A. D) Create a linear model predicting runs using your best predictor. Report its summary. E) Interpret the slope, the y-intercept and the R 2 100% based on context. F) Create diagnostics plots and comment on the validity of your linear model.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help