Assignment #3

.pdf

School

University of Central Florida *

*We aren’t endorsed by this school

Course

7722

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

6

Uploaded by MateBookVulture2127

Report
Assignment #3 : Due Multiple Dates (see below) Instructions: Please answer questions in the order displayed below. Create a single pdf file of your answer pages (if you write, please write clearly) and upload this file. I prefer that you type in your answers. This assignment builds on your assignments #1 and #2 on Real Estate Sales Data. Please look at the description of RealEstateSales.txt data file used in the last two Assignments . In this assignment, use the “traind” sub -data file you have created and used to answer questions in assignments #1 and #2. For the purpose of this assignment, treat predictors as described below: Quantitative Predictors: X1, X2, X3, X5. X7 (please convert X7 to age of the house), X10. Qualitative Predictors: X4, X6, X8, X9, X11 The goal here is to develop a multiple regression model with k ≥1 variables to predict average sales prices of houses in the sampled population. This assignment is due in multiple stages. Follow the following steps to complete this assignment: Model Building Step #1. Data Exploration - Due 11:59PM, 11/1/2021 (10 points) Know your data by looking at appropriate scatter plots, frequency tables, correlations, outliers, interacting variables, etc. Write a summary report of your findings and suggest a tentative multiple regression model to explore further. Your summary report should include your comments on (i) functional relationships, if any, between Y and each quantitative predictor, (ii) potential outliers, (iii) potential multicollinearity problems, (iv) interacting pairs of predictors, if any, (v) problems, if any, with distributions of houses by categories of categorical variables and suggestions, if any, to combine categories of any categorical variable. Completed October 31, 2021.
Model Building Step #2. Subset Model Selection - Due 11:59PM, 11/8/2021 (20 points) Before you start this step, please delete four rows corresponding to X3 = 0, X9=9, 10, 11 from your “traind” file. Then using ifelse statement, create two indicators X81 and X82 to represent three categories of the categorical variable X8, and six indicator variables X91, X92, X93, X94, X95, X96 to represent seven categories of the categorical variable X9. You also need to create the following additional columns in your traind file: X2Sq = X2*X2 X3Sq = X3*X3 Age = 2002- X7 AgeSq = Age*Age After a review of all tentative models from all students in your class, you and your instructor must come to one final tentative model for all in the class to use at this step. Assume the following tentative model in R format: lm(Y ~ X1 + X2 + X2Sq+ X3 + X3Sq + X5 + Age + AgeSq +X6 + X81+X82 + X91 + X92 +X93 +X94 +X95 +X96 + X11 + X1:X6 + X1:X81 + X1:X82 + X1:X91 + X1:X92 + X1:X93 + X1:X94 +X1:X95 + X1:X96 + X1:X11 + X2:X6) In this step, I am asking to select best sub-models of the above model under each of the following methods: (i) all possible subsets The best sub model would be one that has higher Rsq and Adj Rsq values, and lower Cp and Bic values. So Models #7 - #9 seems the best: Model # 7: Y ~ X1 + X3 + X3sq + X5 + Age + AgeSq + X81 Model # 8: Y ~ X1 + X2 + X2sq + X3 + Age + AgeSq + X81 + X2:X6 Model # 9: Y ~ X1 + X2 + X2sq + X3 + Age + AgeSq + X81 + X1:X6 + X2:X6
(ii) best subsets The best sub model would be one that has higher Rsq and Adj Rsq values, and lower Cp and Bic values. So Models #7 - #9 seems the best: Model # 7: Y ~ X1 + X3 + X3sq + X5 + Age + AgeSq + X81 Model # 8: Y ~ X1 + X2 + X2sq + X3 + Age + AgeSq + X81 + X2:X6 Model # 9: Y ~ X1 + X2 + X2sq + X3 + Age + AgeSq + X81 + X1:X6 + X2:X6 (iii) forward method The best sub model would be one that has higher Rsq and Adj Rsq values, and lower Cp and Bic values. So Model #7 seems the best. Model #7: Y ~ X1 + X3 + X3sq + X5 + Age + AgeSq + X81
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help