hw2
.pdf
keyboard_arrow_up
School
Cornell University *
*We aren’t endorsed by this school
Course
2700
Subject
English
Date
Dec 6, 2023
Type
Pages
23
Uploaded by CommodoreGiraffe3679
ENGRD 2700: Basic Engineering Probability and Statistics
Fall 2023
Homework 2
Due Monday, September 6th at 11:59pm.
Submit Solutions to Gradescope by clicking the name of the
assignment. See syllabus for detailed submission instructions.
When completing this assignment (and all subsequent ones), keep in mind the following:
•
You must complete the homework individually and independently.
•
Provide evidence for each of your answers. If a calculation involves only very minor computation then
explain the computation you performed and give the results. If a calculation involves more complicated
steps on many many records then hand in the calculations and formulas for the first few records only.
•
Write clearly and legibly. You are encouraged to
type
your work although you do not have to. We may
deduct points if your answers are di
ffi
cult to read or disorganized.
•
For questions that you answer using Python, attach any code that you write, along with the relevant
plots. You may use other software, but the same condition applies.
•
Submit your homework a single pdf file on Gradescope.
•
Read Chapters 1 and 2 of textbook.
1. The file
quartet.csv
contains four datasets of
x
and
y
values, side by side.
(a) Compute the sample mean, sample median, and sample standard deviation for each column of
the dataset.
(b) Based solely upon the summary statistics you computed in part (a), how do the four datasets
compare?
(c) Construct scatterplots for each of the four datasets.
(d) Based solely upon the plots you generated in part (c), how do the four datasets compare?
(e) What’s the moral of the story? (That is, what does this example suggest about what should be
done when analyzing data?)
Why I am assigning this question:
This question is assigned to reinforce your knowledge of sample
statistics. It also is given to help you learn the Python programming language and get you working
with real data sets.
x1's mean:
9
y1's mean:
7.500909090909091
x2's mean:
9
y2's mean:
7.500909090909091
x3's mean:
9
y3's mean:
7.5
x4's mean:
9
y4's mean:
7.500909090909091
x1's median:
9
y1's median:
7.58
x2's median:
9
y2's median:
8.14
x3's median:
9
y3's median:
7.11
x4's median:
9
y4's median:
7.04
x1's standard deviation:
3.3166247903554
y1's standard deviation:
2.031568135925815
x2's standard deviation:
3.3166247903554
y2's standard deviation:
2.0316567355016177
x3's standard deviation:
3.3166247903554
y3's standard deviation:
2.030423601123667
x4's standard deviation:
3.3166247903554
y4's standard deviation:
2.0305785113876023
b)
Four datasets are almost equal to each other
mean of four datasets are equal to (9, 7.500909090)
four x datasets have same median of 9
but y datasets have different median, ranging 7.04~8.14
Four x data sets have same standard deviation of x
=
3.3166247903554
but y datasets have different standard deviation, but very similar around 2.03
Four datasets' scatterplot on one graph
(X1,y1) dataset
(X2,y2) dataset
(x3,y3) dataset
(X4,y4) dataset
D)
even though the numerical values seem like four datasets are similar or equal, the
actual data plots tell me that those four datasets are so different in the
pattern.
(X1,y1) graph seems like two linear lines
(X2,y1) graph seems like a parabola
(X3,y3) graph seems like a one linear line
(X4,y4) graph seems like two parabolas.
e) moral of the story is that comparing datasets using central tendency like mean,
median, and standard deviation is not an accurate method. We should look at
overall trend in order to analyze datasets.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help