Pt2520 Unit 3 Problem Analysis Paper

Decent Essays
##Import .csv files
##This data frames won't be actually put to use, but it is good to have an unaltered copy of the train and test sets for reference and exploratory analysis.

I decided to employ the h2o package to formulate all three of the models required for the assignment. The main reason for this is that on the midterm many groups used this package and obtained some of the best scores for the assignment. Additionally, during the presentations something that was repeatedly pointed out was its ease of use and simple syntax. On the other hand, after having used the keras neural network package in python for the midterm, the speed of running such was underwhelmingly slow, since GPU acceleration wasn't an alternative due to the lack of
…show more content…
Thus, we must import .csv files using an importFile command specific to h2o. Files are imported as h2o environment variables.

#h2o requires a y (name of the resulting value column) and a x (list of the all the variable columns) variable. Furthermore, in this case I also define the variables as factors to avoid any misrepresentation or errors when generating my model.

#This is a complex model, since we take into consideration almost all possible hyperparameters available on h2o. This model was run several times using different combinations of hyperparameters. The resulting .csv files and kaggle performance scores are attached for comparison. The main purpose for taking in consideration all hyperparameters? Purely exploratory. Why? It was my first time using h2o, therefore I wanted to make sure that I learned as much as possible on how to perform different types of analysis and how distinct features influence
…show more content…
It was created to contrast the previous model. As you can appreciate the stopping_rounds are set to 0. The main reason was to see if the score improved if we permitted h2o run primarily on its default values. To my surprise its performance was better than any of the other models I tried. I do consider there was overfitting, since the stopping_rounds were
Get Access