preview

Analysis Of Data Tri On High Dimensional And Multicollinear Data

Best Essays

University of Connecticut STAT5605 Project :The Analysis of Data-TRI Prediction on high-dimensional and multicollinear data Contents [Abstract]: 2 Section 1: Introduction 3 Section 2: Data Description 4 Section 3: Methods and Models: 5 Section 4: Analysis of Data 8 Principle Component Analysis (PCA) 8 Ridge Regression 17 Section5: Model Comparison, Conclusion and Remarks. 20 Section6: Appendix 23 Appendix.A 23 Appendix.B 27 References 29 [Abstract]: This paper is mainly based on the data provided by TRI, a quick service restaurant company. Given the data, their goal of this research is to predict the revenue so that they can decide whether it is wise to open a new restaurant at some places. In this paper, we first want to find a linear model to predict this data. Due to the limitation of the linear model, several methods, such as PCA (principle component analysis), ridge regression and robust regression, are used for improving the performance of the original model. Main issue solved in this paper is the multicollinearity problem. In the original data, 37 continuous variables are highly collinear. The PCA, ridge regression and robust regression are helpful in some ways. Finally, we come up with some predictions and use the known data to test the accuracy of our model. [Key words] prediction, multicollinearity, high dimension, principle component analysis, robust regression, ridge regression, linear regression Section 1: Introduction With over 1,200 quick

Get Access