SIE 464-564 Midterm Spring 2019
.docx
keyboard_arrow_up
School
University Of Arizona *
*We aren’t endorsed by this school
Course
464
Subject
Industrial Engineering
Date
Jan 9, 2024
Type
docx
Pages
15
Uploaded by BaronParrotMaster742
Due date:
Sunday April 7, 2019 11:59 PM
Exam must be submitted by uploading to the “Midterm” folder in the D2L course
Dropbox.
Students are expected to uphold the
UA Code of Academic Integrity
(
http://deanofstudents.arizona.edu/policies-and-codes/code-academic-integrity
).
This
take home exam is expected to be an individual effort.
Any evidence of cheating will
result in a failing grade on the exam for all parties involved.
This has happened before
and I don’t believe the risk is worthwhile.
In addition to your textbooks, you are allowed to consult any reference in the library, on
the internet, and on D2L.
You are not
allowed to consult with anyone else, regardless of
whether they are enrolled in the class or at UA.
Everyone must answer the first 6 questions.
Those enrolled in SIE 564 must also answer question 7.
An extra credit question is provided at the end.
Question 1 (10 points)
On the episode of FREAKONOMICS “Here’s Why all Your Projects Are Always Late
and What to do About it”, Stephen J. Dubner talks about a tendency that many people
have, the planning fallacy, a term coined by Nobel prize winning psychologist Daniel
Kahneman and another psychologist Amos Tversky in the 1970s. In the podcast,
specialists talk about reasons for this tendency.
A)
Apply metacognition to analyze the manner in which you turn in assignments
during your college career and describe a way in which the planning fallacy may
apply to you, different from the ones mentioned in the podcast.
Applying the metacognition, I observed and evaluated my assignment and found out
that most of my assignments were submitted very close to the due date and time.
Part of it was fear of not getting the correct answers and another part was that I had
a planning fallacy procrastinating each assignment till it was close to the due time
and date to realize that I have to put all my effort into it in order to get it done.
Planning fallacy applies to me in this way as my mind plants fear seeds to prevent
me from trusting my ability to get the assignments correctly right away then I would
have to do them over and over.
B)
Continuing with the planning fallacy, describe a way to overcome it using
DMAIC. Explain in detail what you want to improve, how you will measure your
progress, how you will analyze your data, and what you would do if you
improved?
I want to improve on my ability to get the assignments submitted on earlier than a
few minutes from the due date and time. I want to improve the timing and precision
manner in which I do submit my assignments.
I would measure my assignments’ submission using DMAIC by starting with
working on assignments much earlier. I would also remove distractions. I would set
up a time frame for me to start on each assignment a week earlier than the due date
to give myself enough time to work on them without the pressure associated with
them being close to submission and enough time to review them and ask about them.
I would measure the improvement of my assignment submission by putting a time
frame to evaluate the data I would get if I started my assignments couple days early
giving myself time to research and fully understand the materials of the assignment
before attempting to solve the assignments without fully knowledge of what I was
supposed to do.
The data also will be analyzed by the amount of assignments I submit in a matter of
a month with with the consideration of me getting more understanding of the
materials before doing my assignments.
I would actually reward myself if Improved. The reward I would have would be
having less stressful time during my assignments submission and more time to get
other assignments done or work on hobbies.
Question 2 (10 points)
Based on the Python code from the following link:
https://github.com/coin-dataset/code/blob/master/tc-rc3d/evaluate.py
A)
Please calculate the following:
Physical Lines of Code
131
Logical Lines of Code
83
Comment Lines
35
B)
Based on your answers from Part A, calculate the effort in person-months using
COCOMO II.
Assume all cost driver ratings to be nominal and all scale factors to
be rated high.
The effort based on the values given turned out to be 0.2
C)
Do you think the result from Part B is an overestimate or an underestimate?
Explain
I do think that the results were underestimated because the effort was
extremely small.
D)
Your former classmate is a manager of a commercial building construction
company.
He wants to estimate the effort of constructing commercial buildings
and is asking for your help.
Would it be appropriate if your former classmate
used COCOMO II?
Explain
No, because COCOMO is suitable for software engineering projects not
construction or industrial engineering. Also, COCOMO is old and not updated
recently which gives a high chance of inaccurate estimation.
Question 3 (10 points)
The weather company you work for is performing an analysis on predicting sandstorms
accurately in Phoenix. You have been given a large dataset and are asked to determine
which factors could be used to accurately predict them.
The dataset you are provided with has over 50 variables. With your vast knowledge of
Excel, you determine that 4 variables can be used as predictors to accurately predict
sandstorms. Your boss is not convinced and decided to ask you some questions.
A)
You mentioned that before you even begin doing this “regression analysis” of
yours, you discarded about half of the variables. How did you choose which
variables to discard, and more importantly, why does it matter?
Predictor variables can be excluded from the analysis on the basis of the
following: Identify outliers and influential points - maybe exclude them at least
temporarily. The need to keep only the required predictor variables in the
regression analysis because of the following reasons: 1) Unnecessary predictors
will add noise to the estimation of other quantities that we are interested in.
Degrees of freedom will be wasted. 2) Collinearity is caused by having too many
variables trying to do the same job. 3) If the model is to be used for prediction,
we can save time and/or money by not measuring redundant predictors.
High P-value decreased coefficient goes up and the variable would have high
effect on the model. Which is something you can use in order to discard variables
that are not needed.
Another thing to keep in mind that the cells has text would give errors based
from the Kaggle assignment that all the cells that got it give errors and wouldn’t
work in getting data for the regression analysis.
It also does matter as keeping variable that has high correlation to each other
wouldn’t accurately represent the model. Another reason would be the fact that
these would have high impact on R-squared yet may not project the model to be
as accurate as it should be for the weather company.
B)
You asked a friend and he told you that in his company they use both “Relative
Humidity” and “Number of Sunny Hours” to predict sandstorms. In your report,
you mention you only use the “Relative Humidity”, because of it being highly
correlated with “Number of Sunny Hours”.
However, by themselves each are
good predictors, why didn’t you use both? Wouldn’t that make the model better?
As both “Relative Humidity” and “Number of Sunny Hours” are highly
correlated, it will add collinearity in the model as both the variables are
doing the same job.
C)
By looking at the correlation matrix you created between independent variables
and the dependent variable
, which independent variables based on the type of
correlation did you discard or keep, and why?
a.
Strong positive correlation
b.
Weak positive correlation
c.
No correlation
d.
Weak negative correlation
e.
Strong negative correlation
the variables to keep or discard will depend on the interest of your output. We
cannot filter the correlation values. If there is a correlation between independent
and dependent variables, those will be included in the analysis. If dependent and
independent variables do not have any correlation, that variables will be excluded
from the analysis. Hence, the variables with the following types of correlation will be
included in the analysis:
1.
Strong positive correlation, I will keep.
2.
Strong negative correlation, I will also keep.
I will discard the rest. The reason I’m keeping these variable and discarding the rest
is that the strong correlation whether its positive or negative will have an effect on
the model therefore, they must be kept. As for the ones that have weak or no
correlation, they wouldn’t have that effect on the model so they shouldn’t be in the
model.
For weak correlation, I would take discard them because they are not that impactful
on the model just like the no correlated ones.
Question 4 (20 points)
You tasked Mark and David to come up with a separate regression model to predict house
prices using the Kaggle dataset
train.csv
.
With
SalePrice
as the dependent variable, they
came up with a model based on the first half of the dataset (from Row 2 to Row 731).
Each model consists of the following independent variables:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help