Math 110 Chapter 8 StatCrunch Assignment
.pdf
keyboard_arrow_up
School
Los Medanos College *
*We aren’t endorsed by this school
Course
110
Subject
Mathematics
Date
Jan 9, 2024
Type
Pages
7
Uploaded by SuperHumanHeronMaster391
Name:____William Belton________
StatCrunch Assignment Chapter 8
Due ___11/30/23____
Math 110 MLB 2023 Teams Hitting Data Set
Introduction and data information:
The movie (and book) Moneyball focuses on the “quest for the secret of success in
baseball”. It follows our low-budget east bay team, the Oakland Athletics, who believed
that underused statistics, such as a player’s ability to get on base, better predict the ability
to score runs than typical statistics like home runs, RBIs (runs batted in), and batting
average. Obtaining players who excelled in these underused statistics turned out to be
much more affordable for the team.
In this assignment we’ll be looking at data from all 30 Major League Baseball teams and
examining the linear relationship between runs scored in a season and a number of other
player statistics. Our aim will be to summarize these relationships both graphically and
numerically in order to find which variable, if any, helps us best predict a team’s runs
scored in a season.
You will be using the
Math 110 MLB 2023 Teams Hitting
file. This is the data set of
statistics from all 30 MLB teams in 2023. To find the data file, choose Data -> Data Sets.
In the search box, type “MLB 2023” and click search. Click to open the data file. You
can save the file to “my data” by clicking Data -> Save.
In addition to runs scored, there are some of the usual variables in the data set: at-bats,
hits, home runs, batting average, strikeouts, stolen bases, and walks. There are also three
newer variables: on-base percentage, slugging percentage, and on-base plus slugging.
Part 1: Exploratory data analysis
We want to find the variable that best predicts the runs scored in a season by a team, so
we start by exploring the relationship between different input variables as predictors for
runs.
1. Make a scatter plot comparing At Bat (X variable) and Runs (Y variable). In
StatCrunch select Graph -> Scatterplot, and select the appropriate variables. Make a
second scatter plot comparing Hits (X variable) and Runs (Y variable).
Describe the overall shape and trends you notice in the two graphs. Can you tell which
graph looks to have a stronger linear correlation (e.g. the data points are closer to a line)?
If both graphs look similar to you, that’s ok, just explain.
The second graph looks like it has a stronger linear correlation. Even though it
isn't exactly a straight line, it's closer than the first graph. The first graph is all
over the place.
Baseball players who make lots of hits per season are highly sought after and have
expensive contracts. If a baseball team is looking for a less expensive statistic as a
predictor for runs, the number of at-bats might be a decent alternative.
Let’s investigate other variables.
2. Keep the scatter plot comparing At Bat (X variable) and Runs (Y variable). Edit or
make a second scatter plot comparing Caught Stealing (X variable) and Runs (Y
variable).
Describe the overall shape and trends you notice in the two graphs. Which graph looks to
have a stronger linear correlation (e.g. the data points are closer to a line)?
●
Graph 1 shifts to the left
●
Graph 2 shifts to the right and has a stronger linear correlation
3. Keep the scatter plot comparing At Bat (AB) (X variable) and Runs (Y variable). Edit
or make an additional scatter plot comparing On Base Percentage (OBP) (X variable) and
Runs (Y variable).
Describe the overall shape and trends you notice in the two graphs. Which graph looks to
have a stronger linear correlation (e.g. the data points are closer to a line)?
The Onbase and runs graph has a stronger correlation. It's clearly shaped into a line.
Part 2: Statistical modeling
When the relationship between two variables is roughly linear, we can quantify the
strength of the relationship with the correlation coefficient.
1. Select Stat -> Regression -> Simple Linear and plot a linear regression (line of best fit)
comparing At Bat (X variable) and Runs (Y variable). You can click to the right to see
the line fitted in the scatter plot. You can hover the mouse over the line for the linear
equation. Record the linear equation and correlation coefficient R here:
Simple linear regression results:
Dependent Variable: Runs
Independent Variable: AB (at bats)
Runs = -1856.0411 + 0.47493404 AB (at bats)
Sample size: 30
R (correlation coefficient) = 0.45525464
R-sq = 0.20725678
Estimate of error standard deviation: 81.57023
2. Select Stat -> Regression -> Simple Linear and plot a linear regression (line of best fit)
comparing Hits (X variable) and Runs (Y variable). Record the linear equation and
correlation coefficient R here:
Simple linear regression results:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help