Analysis Of Data Tri On High Dimensional And Multicollinear Data

Best Essays

University of Connecticut STAT5605 Project :The Analysis of Data-TRI Prediction on high-dimensional and multicollinear data Contents [Abstract]: 2 Section 1: Introduction 3 Section 2: Data Description 4 Section 3: Methods and Models: 5 Section 4: Analysis of Data 8 Principle Component Analysis (PCA) 8 Ridge Regression 17 Section5: Model Comparison, Conclusion and Remarks. 20 Section6: Appendix 23 Appendix.A 23 Appendix.B 27 References 29 [Abstract]: This paper is mainly based on the data provided by TRI, a quick service restaurant company. Given the data, their goal of this research is to predict the revenue so that they can decide whether it is wise to open a new restaurant at some places. In this paper, we first want to find a linear model to predict this data. Due to the limitation of the linear model, several methods, such as PCA (principle component analysis), ridge regression and robust regression, are used for improving the performance of the original model. Main issue solved in this paper is the multicollinearity problem. In the original data, 37 continuous variables are highly collinear. The PCA, ridge regression and robust regression are helpful in some ways. Finally, we come up with some predictions and use the known data to test the accuracy of our model. [Key words] prediction, multicollinearity, high dimension, principle component analysis, robust regression, ridge regression, linear regression Section 1: Introduction With over 1,200 quick

Get Access

Better Essays
Crusty Dough Pizza Company Descriptive Statistics Essay
- 1226 Words
- 5 Pages
Crusty Dough Pizza Company Descriptive Statistics Essay
This paper provides a summary of our analysis of the data obtained for 60 Crusty Dough Pizza Company restaurants. We compared 16 pizza store characteristics to monthly profit in order to determine the best indicators of success. The results of this analysis may be used to determine the store services and attributes that have the most bearing on profitably.
- 1226 Words
- 5 Pages
Better Essays
Read More
Good Essays
Crusty Pizza Restaurant: Forecasting Using Regression
- 1239 Words
- 5 Pages
Crusty Pizza Restaurant: Forecasting Using Regression
The purpose of this case is to determine which key variables drive Crusty Pizza Restaurant’s monthly profit and then forecast what the monthly profit would be for potential stores. Based off of this information we will be able to make a recommendation to Crusty Dough Pizza Restaurant on which stores they should open and which they avoid. The group was provided 60 restaurants’ data that included monthly profit, student population, advertising expenditures, parking spots, population within 20 miles, pizza varieties, and competitors within 15 miles. For the potential stores we were given all of this
- 1239 Words
- 5 Pages
Good Essays
Read More
Decent Essays
NT1310 Unit 3 Assignment 1
- 421 Words
- 2 Pages
NT1310 Unit 3 Assignment 1
With the subset of principal component that is predictor in the supervised task can lose the nonlinear
- 421 Words
- 2 Pages
Decent Essays
Read More
Decent Essays
Cancer Practitioner Personal Statement
- 705 Words
- 3 Pages
Cancer Practitioner Personal Statement
I worked on a dataset that analyzed the malignant and benign prostate tissues that provided insight into a gene expression signature for prostate cancer. The goal was to conduct several prediction algorithms including principal component regression, elastic nets, partial least squares on the samples containing gene expression profiling by array to accurately predict prostate cancer. It was based on variables like factors that included disease state, and genotype/phenotype
- 705 Words
- 3 Pages
Decent Essays
Read More
Good Essays
Schizophreni A Longitudinal Pattern Recognition
- 1534 Words
- 7 Pages
Schizophreni A Longitudinal Pattern Recognition
The algorithms used to determine predicted brain age considered gray matter density as a single variable, x. Support Vector machine and Support Vector Regression machine, which are both learning algorithms, were used. Both machines require two phases, a training phase using the variable x and a test phase. The training phase is the phase in which baseline MRI scans of healthy subjects are used, and the result is a brain age prediction model. From this, the prediction model is applied to new data in the test phase to ensure the model works properly. The Support Vector machine was
- 1534 Words
- 7 Pages
Good Essays
Read More
Decent Essays
Jetski Bivariate Data Investigation
- 529 Words
- 3 Pages
Jetski Bivariate Data Investigation
This investigation will look into 3 Data Sets that have an X and Y value of time and value of Jetski, the years go up in 0.5 meaning every 6 months a new value is set in the data set tables. An investigation will be done to concour what data set best suits the real world scenario of Jetski ownership. This will be proved in a number of different mathematical methods and strategies. Many types of graphs will be utilised throughout this experiment for maximal analysis of data sets for example scatter plots will be drawn using the given data. Modifications are made to these graphs in order to create predictions, Regression line equations will be created, used and analysed in this experiment. Regression line equations are trend lines on
- 529 Words
- 3 Pages
Decent Essays
Read More
Satisfactory Essays
Ischemic Stroke Lab Report
- 426 Words
- 2 Pages
Ischemic Stroke Lab Report
Logistic regression model, as a usual approach before, was used to analyze the stroke outcomes' data. Fortunately, because of its potentially more powerful high-level prediction performance, machine learning algorithms have been proposed as an alternative to analyze large-scale multivariate data. Support vector machine (SVM) is one of the most popular machine learning methods to use for recognition or classification. Support vector machine (SVM) is one of the most popular machine learning methods used for recognition or classification.
- 426 Words
- 2 Pages
Satisfactory Essays
Read More
Decent Essays
Melanoma Detection Process
- 595 Words
- 3 Pages
Melanoma Detection Process
The area I am concentrating is the feature extraction and Feature selection of clinical image.
- 595 Words
- 3 Pages
Decent Essays
Read More
Decent Essays
Measuring The Model Of The Tree Diagram Is The Best Predictor Of Heart Disease Essay
- 1089 Words
- 5 Pages
Measuring The Model Of The Tree Diagram Is The Best Predictor Of Heart Disease Essay
For the value 6 and 7 for thal, the next best predictor is ca and for the thal value 3 the next best predictor is age.
- 1089 Words
- 5 Pages
Decent Essays
Read More
Decent Essays
Iterative Multivariate Regression For Correlated Responses
- 1246 Words
- 5 Pages
Iterative Multivariate Regression For Correlated Responses
Multivariate regression is a standard statistical tool that regresses independent variables (predictors) against a single dependent variable (response variable).The objective is to find a linear model that best predicts the dependent variable from the independent variables. In order to explain the data in the simplest way, redundant or unnecessary predictors should be removed. Such eliminating process is needed for the following reasons. First, unnecessary predictors will add noise to the estimation of other quantities that we are interested, causing loss in degrees of freedom in statistical point of view. Second, if the model is to be used for prediction, we can save time and/or money by not measuring redundant predictors. Finally, multi co-linearity is caused by having too many variables trying to do the same job.
- 1246 Words
- 5 Pages
Decent Essays
Read More
Decent Essays
Snowshoe Har's Model
- 352 Words
- 2 Pages
Snowshoe Har's Model
The final model was shown to have minimal error and was therefore a strong representation of the data. However, due to its continuous nature, the model failed to display the fluctuations in population
- 352 Words
- 2 Pages
Decent Essays
Read More
Better Essays
Trunk Rotation Essay
- 1645 Words
- 7 Pages
Trunk Rotation Essay
76 data was recorded in a personal computer and further analyzed by customized software in MATLAB (The
- 1645 Words
- 7 Pages
Better Essays
Read More
Decent Essays
Principal Component Analysis Paper
- 666 Words
- 3 Pages
Principal Component Analysis Paper
Principal component analysis (PCA) is one of the most widely used multivariate statistical techniques. PCA could be used to extract the important information from the data table that contains the observations described by dependent variables. Then, PCA used a set of new orthogonal variables, which called principal components (PCs), to express the important information. Besides, PCA could also represent the pattern of similarity of observations and of the variables by drawing them as points in maps [5].
- 666 Words
- 3 Pages
Decent Essays
Read More
Decent Essays
The Regression Model Should Not Only The Independent Variables That Were Chosen Based On Their Characteristics
- 1020 Words
- 5 Pages
The Regression Model Should Not Only The Independent Variables That Were Chosen Based On Their Characteristics
In order to remedy serial correlation, GLS is introduced. Generalized least squares (GLS) is “a method of ridding an equation of pure first-order serial correlation and in the process restoring the minimum variance property to its estimation (Studenmund, 2006).” In this model, GLS did not necessarily run because of the inconclusive result of the DW test, so that the presence of serial correlation is unsure.
- 1020 Words
- 5 Pages
Decent Essays
Read More
Decent Essays
What Is The Identification Approach?
- 1082 Words
- 5 Pages
What Is The Identification Approach?
In this paper, a novel variable selection technique with adaptive shrinkage stochastic search is used to understand the influences of the predictor variables. Variable selection is a challenging task and several Bayesian techniques are available. A comparative review of the methods are given in O'Hara and Sillanpää (2009). This paper develops a technique that combines a binary indicator model selection using stochastic search introduced by George and McCulloch (1993) and an adaptive shrinkage method (Zou, 2006) through Laplacian prior (Park & Casella, 2008) for a binomial logistic model. A similar approach has also been used by Lykou and Ntzoufras (2013), however they used the Laplacian prior without any adaptive shrinkage for Gaussian model. Ročková and George (2016) developed a hybrid Gaussian modelling approach that incorporates the stochastic variable selection like spike-and-slab (Ishwaran & Rao, 2005) and penalized approach like the Lasso (Tibshirani, 1996). In addition, a non-Gaussian version based on generalised linear model is also developed by Tang et al. (2017). However, all these methods did not consider any spatially correlated process in their modelling hierarchy.
- 1082 Words
- 5 Pages
Decent Essays
Read More

Get Access

Analysis Of Data Tri On High Dimensional And Multicollinear Data

Crusty Dough Pizza Company Descriptive Statistics Essay

Crusty Dough Pizza Company Descriptive Statistics Essay

Crusty Pizza Restaurant: Forecasting Using Regression

Crusty Pizza Restaurant: Forecasting Using Regression

NT1310 Unit 3 Assignment 1

NT1310 Unit 3 Assignment 1

Cancer Practitioner Personal Statement

Cancer Practitioner Personal Statement

Schizophreni A Longitudinal Pattern Recognition

Schizophreni A Longitudinal Pattern Recognition

Jetski Bivariate Data Investigation

Jetski Bivariate Data Investigation

Ischemic Stroke Lab Report

Ischemic Stroke Lab Report

Melanoma Detection Process

Melanoma Detection Process

Measuring The Model Of The Tree Diagram Is The Best Predictor Of Heart Disease Essay

Measuring The Model Of The Tree Diagram Is The Best Predictor Of Heart Disease Essay

Iterative Multivariate Regression For Correlated Responses

Iterative Multivariate Regression For Correlated Responses

Snowshoe Har's Model

Snowshoe Har's Model

Trunk Rotation Essay

Trunk Rotation Essay

Principal Component Analysis Paper

Principal Component Analysis Paper

The Regression Model Should Not Only The Independent Variables That Were Chosen Based On Their Characteristics

The Regression Model Should Not Only The Independent Variables That Were Chosen Based On Their Characteristics

What Is The Identification Approach?

What Is The Identification Approach?

Related Topics