SIE 464-564 Midterm Spring 2019

.docx

School

University Of Arizona *

*We aren’t endorsed by this school

Course

464

Subject

Industrial Engineering

Date

Jan 9, 2024

Type

docx

Pages

15

Uploaded by BaronParrotMaster742

Report
Due date: Sunday April 7, 2019 11:59 PM Exam must be submitted by uploading to the “Midterm” folder in the D2L course Dropbox. Students are expected to uphold the UA Code of Academic Integrity ( http://deanofstudents.arizona.edu/policies-and-codes/code-academic-integrity ). This take home exam is expected to be an individual effort. Any evidence of cheating will result in a failing grade on the exam for all parties involved. This has happened before and I don’t believe the risk is worthwhile. In addition to your textbooks, you are allowed to consult any reference in the library, on the internet, and on D2L. You are not allowed to consult with anyone else, regardless of whether they are enrolled in the class or at UA. Everyone must answer the first 6 questions. Those enrolled in SIE 564 must also answer question 7. An extra credit question is provided at the end.
Question 1 (10 points) On the episode of FREAKONOMICS “Here’s Why all Your Projects Are Always Late and What to do About it”, Stephen J. Dubner talks about a tendency that many people have, the planning fallacy, a term coined by Nobel prize winning psychologist Daniel Kahneman and another psychologist Amos Tversky in the 1970s. In the podcast, specialists talk about reasons for this tendency. A) Apply metacognition to analyze the manner in which you turn in assignments during your college career and describe a way in which the planning fallacy may apply to you, different from the ones mentioned in the podcast. Applying the metacognition, I observed and evaluated my assignment and found out that most of my assignments were submitted very close to the due date and time. Part of it was fear of not getting the correct answers and another part was that I had a planning fallacy procrastinating each assignment till it was close to the due time and date to realize that I have to put all my effort into it in order to get it done. Planning fallacy applies to me in this way as my mind plants fear seeds to prevent me from trusting my ability to get the assignments correctly right away then I would have to do them over and over. B) Continuing with the planning fallacy, describe a way to overcome it using DMAIC. Explain in detail what you want to improve, how you will measure your progress, how you will analyze your data, and what you would do if you improved? I want to improve on my ability to get the assignments submitted on earlier than a few minutes from the due date and time. I want to improve the timing and precision manner in which I do submit my assignments. I would measure my assignments’ submission using DMAIC by starting with working on assignments much earlier. I would also remove distractions. I would set up a time frame for me to start on each assignment a week earlier than the due date to give myself enough time to work on them without the pressure associated with them being close to submission and enough time to review them and ask about them. I would measure the improvement of my assignment submission by putting a time frame to evaluate the data I would get if I started my assignments couple days early giving myself time to research and fully understand the materials of the assignment before attempting to solve the assignments without fully knowledge of what I was supposed to do. The data also will be analyzed by the amount of assignments I submit in a matter of a month with with the consideration of me getting more understanding of the materials before doing my assignments.
I would actually reward myself if Improved. The reward I would have would be having less stressful time during my assignments submission and more time to get other assignments done or work on hobbies.
Question 2 (10 points) Based on the Python code from the following link: https://github.com/coin-dataset/code/blob/master/tc-rc3d/evaluate.py A) Please calculate the following: Physical Lines of Code 131 Logical Lines of Code 83 Comment Lines 35 B) Based on your answers from Part A, calculate the effort in person-months using COCOMO II. Assume all cost driver ratings to be nominal and all scale factors to be rated high. The effort based on the values given turned out to be 0.2 C) Do you think the result from Part B is an overestimate or an underestimate? Explain I do think that the results were underestimated because the effort was extremely small. D) Your former classmate is a manager of a commercial building construction company. He wants to estimate the effort of constructing commercial buildings and is asking for your help. Would it be appropriate if your former classmate used COCOMO II? Explain No, because COCOMO is suitable for software engineering projects not construction or industrial engineering. Also, COCOMO is old and not updated recently which gives a high chance of inaccurate estimation.
Question 3 (10 points) The weather company you work for is performing an analysis on predicting sandstorms accurately in Phoenix. You have been given a large dataset and are asked to determine which factors could be used to accurately predict them. The dataset you are provided with has over 50 variables. With your vast knowledge of Excel, you determine that 4 variables can be used as predictors to accurately predict sandstorms. Your boss is not convinced and decided to ask you some questions. A) You mentioned that before you even begin doing this “regression analysis” of yours, you discarded about half of the variables. How did you choose which variables to discard, and more importantly, why does it matter? Predictor variables can be excluded from the analysis on the basis of the following: Identify outliers and influential points - maybe exclude them at least temporarily. The need to keep only the required predictor variables in the regression analysis because of the following reasons: 1) Unnecessary predictors will add noise to the estimation of other quantities that we are interested in. Degrees of freedom will be wasted. 2) Collinearity is caused by having too many variables trying to do the same job. 3) If the model is to be used for prediction, we can save time and/or money by not measuring redundant predictors. High P-value decreased coefficient goes up and the variable would have high effect on the model. Which is something you can use in order to discard variables that are not needed. Another thing to keep in mind that the cells has text would give errors based from the Kaggle assignment that all the cells that got it give errors and wouldn’t work in getting data for the regression analysis. It also does matter as keeping variable that has high correlation to each other wouldn’t accurately represent the model. Another reason would be the fact that
these would have high impact on R-squared yet may not project the model to be as accurate as it should be for the weather company. B) You asked a friend and he told you that in his company they use both “Relative Humidity” and “Number of Sunny Hours” to predict sandstorms. In your report, you mention you only use the “Relative Humidity”, because of it being highly correlated with “Number of Sunny Hours”. However, by themselves each are good predictors, why didn’t you use both? Wouldn’t that make the model better? As both “Relative Humidity” and “Number of Sunny Hours” are highly correlated, it will add collinearity in the model as both the variables are doing the same job. C) By looking at the correlation matrix you created between independent variables and the dependent variable , which independent variables based on the type of correlation did you discard or keep, and why? a. Strong positive correlation b. Weak positive correlation c. No correlation d. Weak negative correlation e. Strong negative correlation the variables to keep or discard will depend on the interest of your output. We cannot filter the correlation values. If there is a correlation between independent and dependent variables, those will be included in the analysis. If dependent and independent variables do not have any correlation, that variables will be excluded from the analysis. Hence, the variables with the following types of correlation will be included in the analysis: 1. Strong positive correlation, I will keep. 2. Strong negative correlation, I will also keep. I will discard the rest. The reason I’m keeping these variable and discarding the rest is that the strong correlation whether its positive or negative will have an effect on the model therefore, they must be kept. As for the ones that have weak or no correlation, they wouldn’t have that effect on the model so they shouldn’t be in the model. For weak correlation, I would take discard them because they are not that impactful on the model just like the no correlated ones. Question 4 (20 points) You tasked Mark and David to come up with a separate regression model to predict house prices using the Kaggle dataset train.csv . With SalePrice as the dependent variable, they came up with a model based on the first half of the dataset (from Row 2 to Row 731). Each model consists of the following independent variables:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help