HW7-solutions
.html
keyboard_arrow_up
School
University of Texas *
*We aren’t endorsed by this school
Course
230
Subject
Statistics
Date
Apr 3, 2024
Type
html
Pages
9
Uploaded by sjobs3121
Homework 7 for ISEN 355 (System Simulation)
¶
(C) 2023 David Eckman
Due Date: Upload to Canvas by 9:00pm (CDT) on Friday, March 24.
In [ ]:
# Import some useful Python packages.
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats
import pandas as pd
Problem 1. (50 points)
¶
The file canvas_total_times.csv
contains the total time (in hours) each student has spent on the ISEN 355 Canvas page. The instructor, TA, and graders were excluded from the data.
In [ ]:
# Import the dataset.
mydata = pd.read_csv('canvas_total_times.csv', header=None)
# Convert the data to a list.
times = mydata[mydata.columns[0]].values.tolist()
(a) (2 points)
Do you think it is reasonable to assume that the data are independent and identically distributed? Why or why not?
If we think of the distribution as reflecting the Canvas habits of the "typical" ISEN 355 student, then it is reasonable to assume that the data are identically distributed.
It would be reasonable to assume the data are independent if we believe that each student visits the Canvas webpage on their own devices on their own time. There are, however, a number of factors that could contribute to the Canvas times being dependent. For example, all students must log on to Canvas to take the timed quizzes during the labs. In addition, some students complete the homework assignments in pairs, so if they are working off of only one computer, then Canvas is only open on one
of their accounts. Likewise for the group project assignments. Given that total times are in the tens of hours, and students may not have Canvas open most of the time when working on the homework and the project, these dependences may be minimal.
(b) (3 points)
Plot a histogram of the times using 10 bins. Label the axes of your plot.
Comment on the shape of the data.
In [ ]:
plt.hist(times, bins=10)
plt.xlabel("Time Spent on Canvas (hours)")
plt.ylabel("Frequency")
plt.title("Histogram of Time Spent on Canvas")
plt.show()
The histogram shows that most students are on Canvas for less than 30 hours, while only a few are on Canvas for more than 50 hours. The shape of the histogram has a right tail and does not look symmetric.
(c) (4 points)
How many outliers do you see in the data? What is a possible explanation for these high values? In the rest of the problem, we will fit various distributions to the time data and assess their fit. What do you think we should do about the outliers? Explain your reasoning.
From the histogram, it appears that 3 students spent more than 50 hours on Canvas, thus there are 3 outliers. It is possible that these 3 students simply left Canvas open on one of their browser tabs. If this in indeed the case, then we might consider the observations to be erroneous, in which case it would be advisable to remove the outliers. If instead, we were to discover that these students actually intentionally view the Canvas page for 50+ hours, then we should leave the outliers in the data.
In [ ]:
# Although not necessary, we could use the following piece of code to get the number of
outliers:
outliers = [value for value in times if value > 50]
num_outliers = len(outliers)
print(num_outliers)
3
For the rest of the assignment, use ALL of the data in canvas_total_times.csv
.
(d) (3 points)
Use scipy.stats.rv_continuous.fit()
to compute maximum likelihood estimators (MLEs) for the mean (
loc
) and standard deviation (
scale
) of a normal distribution fitted to the data. See the accompanying documentation
to see how the function is called as well as the documentation about scipy.stats.norm
. The term rv_continous
is not part of the function call; instead, you would specifically use scipy.stats.norm.fit()
. Report the MLEs. (Note: This .fit
method is different from
the one we used in the previous homework to fit a Poisson distribution. You should not need to update SciPy to use this function.)
(Note: For functions like .fit()
that return multiple outputs, you will want to define multiple variables on the left-hand side of the equal sign. For example a, b = myfunction(c)
would be a way to separate two variables (
a
and b
) returned by one call of the function myfunction()
.)
In [ ]:
loc_mle, scale_mle = scipy.stats.norm.fit(times)
print(f"The MLEs of the fitted normal distribution are a mean of {round(loc_mle, 2)} and a standard deviation of {round(scale_mle, 2)}.")
The MLEs of the fitted normal distribution are a mean of 17.17 and a standard deviation
of 10.7.
(e) (2 points)
If you were to generate a single time from the fitted normal distribution, what is the probability it would be negative? Based on the mean and standard deviation of the fitted normal distribution, should you be concerned about using this input distribution? Explain your reasoning.
In [ ]:
prob_negative = scipy.stats.norm.cdf(x=0, loc=loc_mle, scale=scale_mle)
print(f"The probability of generating a negative value is {round(prob_negative, 3)}.")
The probability of generating a negative value is 0.054.
We should be concerned, because the chance of generating a negative value is about 1/20, which is very high. If the simulation model needed to generate many Canvas time, it would very likely generate a negative value at some point, which is an impossible about of time to be on Canvas.
(f) (3 points)
Reproduce your histogram from part (b) with density=True
. Superimpose the pdf of the fitted normal distribution over the interval [0, 70]. Use x =
np.linspace(0, 70, 71)
. Comment on the visual fit of the normal distribution.
In [ ]:
plt.hist(times, bins=10, density=True)
x = np.linspace(0, 70, 71)
plt.plot(x, scipy.stats.norm.pdf(x, loc_mle, scale_mle))
plt.xlabel("Time Spent on Canvas (hours)")
plt.ylabel("Frequency / PDF")
plt.title("Histogram / Normal PDF")
plt.show()
The normal distribution does not fit the data well. Even if the outliers were excluded, the fit for the rest of the data looks poor. I particular, the normal distribution puts too much probability density on times <= 5 hours (including negative values!), and not enough on the range 5-15 hours, were most of the data is concentrated. The misalignment between the mode (peak) of the normal distribution and the tallest bars in the histogram is possibly explained by the outliers shifting the normal distribution to
the right.
(g) (4 points)
Use scipy.stats.rv_continuous.fit()
to compute maximum likelihood estimators (MLEs) for the shape parameter (
a
), location parameter (
loc
) and
scale parameter (
scale
) of a gamma distribution fitted to the data. See the documentation about scipy.stats.gamma
. Report the MLEs.
In [ ]:
a_mle, loc_mle, scale_mle = scipy.stats.gamma.fit(times)
print(f"The MLEs of the fitted gamma distribution are a = {round(a_mle, 3)}, loc = {round(loc_mle, 3)}, and scale = {round(scale_mle, 3)}.")
The MLEs of the fitted gamma distribution are a = 1.567, loc = 4.484, and scale = 8.096.
(h) (4 points)
You are about to conduct a Chi-Square goodness-of-fit test for the fitted
gamma distribution using 10 equiprobable bins. In preparation, you will need find the breakpoints of the bins, count the observed
number of data points in each bin, and compute the expected
number of data points in each bin.
1. To get the endpoints of the bins, use scipy.stats.gamma.ppf()
to calculate quantiles of the fitted gamma distribution at q = np.linspace(0, 1, 11)
.
2. To get the observed number in each bin, use np.histogram()
(documentation found here
). (Note: The function np.histogram()
returns multiple outputs.)
3. To get the expected number in each bin, note that you have 10 equiprobable bins. The expected numbers need to sum up to the total number of observations
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Write a note on unrestricted variable?
arrow_forward
Discuss a short science related issue and problem and give at least one science and technology policies that could help address the problem
arrow_forward
Can you please give me an example of non redundant linear format?
arrow_forward
Part 2 only.
arrow_forward
Can you repost the solution as an image or fix the format because its unreadable like this.
arrow_forward
Summarize the steps for solving a linear programmingmodel graphically
arrow_forward
D2L Course Schedule ×
zm MATH 140 AC O
Math 140 AC2 MT X
Home - Google Dix
Untitled documen X
APznzaZOmoEKtr X adidas Samba OG × ✓ Math 140 AC2 MT × | +
C
doc-08-9s-prod-00-apps-viewer.googleusercontent.com/viewer2/prod-00/pdf/4b40ij3n5ssib10rfbdvgd4hcOnkjh9f/0972m4636 vecpvn6ctclvph34kfeur7... Q
Update:
APznzaZOmoEKtnM4eg2YA5rwDV97bDw0GG39cSO6IEGFlaYSOS4iHO...
2 / 3
-
90%
+ |
2. We spent a lot of time working with linear functions and distinguishing them from
nonlinear functions. For a-c, if the function is linear, find its equation. If it is not linear,
explain why it's not linear.
(7.5 pts.)
a) The UPass is a great program for students in college. You pay $80 for unlimited ‘L'
and bus rides for the entire semester. U(x) is the cost for x 'L' and bus rides in one
semester.
b)
X
g(x)
-1
-1
-2
-3
2
5
0
1
1
3
c)
4.
3.
2
1-
6
-5
-4
-3
-2
-1
0
1
2
4
5
6
7
8
-1-
-2
arrow_forward
I don’t understand this and what the answer could be
arrow_forward
Write about the CrossProduct topic in detail without using the resources
arrow_forward
SC
Login
student.masteryconnect.com/?iv=Y2jSKTuoeQLixbL7Fh2S5A&tok=8xfGMRkg-9V-G9rOuwgQr1xmh2B9mu
Gr 7 Math Asst 2 Journey TN 2022-2023
Neal, Addie
□(x+6) + (x+6) + (x+6) + (x+6)
(x+6) (x+6) (x+6) (x+6)
□ 4(x+6)
4x+6
10x
A square has side lengths of (x + 6) inches. Which expressions represent the perimeter of the square, in inches?
Select ALL that apply.
个
#
с
$
ola
10
%
7 of 24
DELL
Oll
A
arrow_forward
X
+Yassi Kariminia - BUDGETS
For each diagram, determine the value of x
a)
2x
3x
b)
Unit 6 - Assignment
X
Appreciation and Depreciat X
oogle.com/c/NDU1MjI5MTE2Mjc3/a/NDU1MjI5MTE2NDM3/details
5.
2x + 7
3x - 2
O
+Yassi Kariminia -
H
arrow_forward
(Ccotax - csc2x)
CSC2X)
arrow_forward
A number smaller than machine precision cannot be represented in a
floating point system.
True
False
arrow_forward
Find two numbers whose sum is
10, and the sum of whose squares
is minimum
O 3,7
O 4, 6
O 5, 5
O 2, 8
arrow_forward
For number 35, I can't figure out how to set up the equation.
arrow_forward
Course: Math-Ms.Inas-G10ABF X
2-1 textbook.pdf
< Exit 2-1: Assignment October 26, 2022
Review Progress
✰ savvasrealize.com/assignments/viewer/classes/da69335e345b4cdd8a94571c8362735e/assignments/40f3e6807a504abe906fdbf5ce5f5efe/conte...
Type here to search
X
What is the equation written in vertex form of a parabola with a vertex of (9, -1) that passes through (7, 7)?
O A. y = 2(x-9)² - 1
B. y = 2(x + 1)2-9
C. y = 2(x-7)² + 7
O D. y = 7(x-9)² - 1
100
Savvas Realize
O
+
Question
35°C Sunny
3
2
of 5
C
Back
J
Due 12/31/22 11:59pm
ENG
Next ▶
x
10:54 AM
22/10/2022
:
arrow_forward
Show in detail
arrow_forward
M Inbox (1,750) - eriley1490@span X
vDyw_ouB3k2MUatGuCHilZYF9GJel00T8/edit#slide%=id.gc178054401_0_14
-IP
dd-ons Help
Last edit was seconds ago
BIUA
三
E - E - E E
Arial
14
+
2
3
2.
A car cost $22,000 and will depreciate in value by 14% each year.
Write a function that models the value of the car over time. Use x for years, and y for the value
of the car, in dollars.
Find the value of the car at the end of 5 years.
区ト
arrow_forward
* Southern New Hampshire Univer X
B 7-1 SmartBook Assignment: Cha X
Connect
MAT 133 Final Project Guidelines X+
A https://ezto.mheducation.com/ext/map/index.html?_con3Dcon&external_browser%3D0&launchUrl=https%253A%252F%252Fnewconne... *
た 回
8-1 Final Exam i
Saved
Help
Save & Exit
Submit
Given the following probability distribution, what is the expected value of the random variable X?
16
P(X)
100
.10
150
.20
200
.30
250
.30
300
.10
Sum
1.00
Multiple Choice
175
150
Mc
Graw
Hill
Education
2:10 PM
0 哥
23
W
10/15/2020
P Type here to search
arrow_forward
please help answer these questions I made a pdf because these are practice questions.
arrow_forward
Rapididentity
O Course Modules: E x
O Study Guide 2
E Copy of Consecutiv x
9 Course Modules: L
Copy of Study Guic x
s.google.com/document/d/1inhYXNuH5u5ovUcdd8rlBPmFOFPgq4uxtDOuGEHXQOA/edit
Movies Onli.
A Dashboard
A Desmos | Beautiful,
17. Fill in the values in the boxes to make Equation
2 equivalent to Equation 1:
Equation 1:
-3x + 4y = 2
Equation 2:
中
y D
Box 1
Box 2
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning
Algebra for College Students
Algebra
ISBN:9781285195780
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning
Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell
Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning
Related Questions
- D2L Course Schedule × zm MATH 140 AC O Math 140 AC2 MT X Home - Google Dix Untitled documen X APznzaZOmoEKtr X adidas Samba OG × ✓ Math 140 AC2 MT × | + C doc-08-9s-prod-00-apps-viewer.googleusercontent.com/viewer2/prod-00/pdf/4b40ij3n5ssib10rfbdvgd4hcOnkjh9f/0972m4636 vecpvn6ctclvph34kfeur7... Q Update: APznzaZOmoEKtnM4eg2YA5rwDV97bDw0GG39cSO6IEGFlaYSOS4iHO... 2 / 3 - 90% + | 2. We spent a lot of time working with linear functions and distinguishing them from nonlinear functions. For a-c, if the function is linear, find its equation. If it is not linear, explain why it's not linear. (7.5 pts.) a) The UPass is a great program for students in college. You pay $80 for unlimited ‘L' and bus rides for the entire semester. U(x) is the cost for x 'L' and bus rides in one semester. b) X g(x) -1 -1 -2 -3 2 5 0 1 1 3 c) 4. 3. 2 1- 6 -5 -4 -3 -2 -1 0 1 2 4 5 6 7 8 -1- -2arrow_forwardI don’t understand this and what the answer could bearrow_forwardWrite about the CrossProduct topic in detail without using the resourcesarrow_forward
- SC Login student.masteryconnect.com/?iv=Y2jSKTuoeQLixbL7Fh2S5A&tok=8xfGMRkg-9V-G9rOuwgQr1xmh2B9mu Gr 7 Math Asst 2 Journey TN 2022-2023 Neal, Addie □(x+6) + (x+6) + (x+6) + (x+6) (x+6) (x+6) (x+6) (x+6) □ 4(x+6) 4x+6 10x A square has side lengths of (x + 6) inches. Which expressions represent the perimeter of the square, in inches? Select ALL that apply. 个 # с $ ola 10 % 7 of 24 DELL Oll Aarrow_forwardX +Yassi Kariminia - BUDGETS For each diagram, determine the value of x a) 2x 3x b) Unit 6 - Assignment X Appreciation and Depreciat X oogle.com/c/NDU1MjI5MTE2Mjc3/a/NDU1MjI5MTE2NDM3/details 5. 2x + 7 3x - 2 O +Yassi Kariminia - Harrow_forward(Ccotax - csc2x) CSC2X)arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- College Algebra (MindTap Course List)AlgebraISBN:9781305652231Author:R. David Gustafson, Jeff HughesPublisher:Cengage LearningAlgebra for College StudentsAlgebraISBN:9781285195780Author:Jerome E. Kaufmann, Karen L. SchwittersPublisher:Cengage LearningAlgebra: Structure And Method, Book 1AlgebraISBN:9780395977224Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. ColePublisher:McDougal Littell
- Elementary Geometry for College StudentsGeometryISBN:9781285195698Author:Daniel C. Alexander, Geralyn M. KoeberleinPublisher:Cengage Learning
College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning
Algebra for College Students
Algebra
ISBN:9781285195780
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning
Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell
Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning