Math 110 Chapter 8 StatCrunch Assignment

.pdf

School

Los Medanos College *

*We aren’t endorsed by this school

Course

110

Subject

Mathematics

Date

Jan 9, 2024

Type

pdf

Pages

7

Uploaded by SuperHumanHeronMaster391

Name:____William Belton________ StatCrunch Assignment Chapter 8 Due ___11/30/23____ Math 110 MLB 2023 Teams Hitting Data Set Introduction and data information: The movie (and book) Moneyball focuses on the “quest for the secret of success in baseball”. It follows our low-budget east bay team, the Oakland Athletics, who believed that underused statistics, such as a player’s ability to get on base, better predict the ability to score runs than typical statistics like home runs, RBIs (runs batted in), and batting average. Obtaining players who excelled in these underused statistics turned out to be much more affordable for the team. In this assignment we’ll be looking at data from all 30 Major League Baseball teams and examining the linear relationship between runs scored in a season and a number of other player statistics. Our aim will be to summarize these relationships both graphically and numerically in order to find which variable, if any, helps us best predict a team’s runs scored in a season. You will be using the Math 110 MLB 2023 Teams Hitting file. This is the data set of statistics from all 30 MLB teams in 2023. To find the data file, choose Data -> Data Sets. In the search box, type “MLB 2023” and click search. Click to open the data file. You can save the file to “my data” by clicking Data -> Save. In addition to runs scored, there are some of the usual variables in the data set: at-bats, hits, home runs, batting average, strikeouts, stolen bases, and walks. There are also three newer variables: on-base percentage, slugging percentage, and on-base plus slugging. Part 1: Exploratory data analysis We want to find the variable that best predicts the runs scored in a season by a team, so we start by exploring the relationship between different input variables as predictors for runs. 1. Make a scatter plot comparing At Bat (X variable) and Runs (Y variable). In StatCrunch select Graph -> Scatterplot, and select the appropriate variables. Make a second scatter plot comparing Hits (X variable) and Runs (Y variable).
Describe the overall shape and trends you notice in the two graphs. Can you tell which graph looks to have a stronger linear correlation (e.g. the data points are closer to a line)? If both graphs look similar to you, that’s ok, just explain. The second graph looks like it has a stronger linear correlation. Even though it isn't exactly a straight line, it's closer than the first graph. The first graph is all over the place. Baseball players who make lots of hits per season are highly sought after and have expensive contracts. If a baseball team is looking for a less expensive statistic as a predictor for runs, the number of at-bats might be a decent alternative. Let’s investigate other variables. 2. Keep the scatter plot comparing At Bat (X variable) and Runs (Y variable). Edit or make a second scatter plot comparing Caught Stealing (X variable) and Runs (Y variable). Describe the overall shape and trends you notice in the two graphs. Which graph looks to have a stronger linear correlation (e.g. the data points are closer to a line)? Graph 1 shifts to the left Graph 2 shifts to the right and has a stronger linear correlation 3. Keep the scatter plot comparing At Bat (AB) (X variable) and Runs (Y variable). Edit or make an additional scatter plot comparing On Base Percentage (OBP) (X variable) and Runs (Y variable). Describe the overall shape and trends you notice in the two graphs. Which graph looks to have a stronger linear correlation (e.g. the data points are closer to a line)? The Onbase and runs graph has a stronger correlation. It's clearly shaped into a line.
Part 2: Statistical modeling When the relationship between two variables is roughly linear, we can quantify the strength of the relationship with the correlation coefficient. 1. Select Stat -> Regression -> Simple Linear and plot a linear regression (line of best fit) comparing At Bat (X variable) and Runs (Y variable). You can click to the right to see the line fitted in the scatter plot. You can hover the mouse over the line for the linear equation. Record the linear equation and correlation coefficient R here: Simple linear regression results: Dependent Variable: Runs Independent Variable: AB (at bats) Runs = -1856.0411 + 0.47493404 AB (at bats) Sample size: 30 R (correlation coefficient) = 0.45525464 R-sq = 0.20725678 Estimate of error standard deviation: 81.57023 2. Select Stat -> Regression -> Simple Linear and plot a linear regression (line of best fit) comparing Hits (X variable) and Runs (Y variable). Record the linear equation and correlation coefficient R here: Simple linear regression results:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help