Dongdi Zhao
Title: Monthly Milk Production Analysis
Objective:
To predict monthly milk production (in pounds per cow) using H2O AutoML and the
provided historical dataset.
Data Overview:
The dataset spans from January 1962 to December 1975, detailing monthly milk
production in pounds per cow.
Methodology:
Data Preparation:
No missing values were identified.
Checked for outliers; none found.
Formatted the data, including timestamp parsing and feature engineering.
Feature Engineering:
Extracted month and year from the timestamp.
Incorporated lag features to capture historical trends.
Target Variable:
Monthly milk production in pounds per cow.
Model Training:
H2O AutoML identified the Gradient Boosting Machine (GBM) as the best-performing
model.
Achieved a Mean Absolute Error (MAE) of 13.5 pounds on the validation set.
Model Evaluation:
MAE: 13.5 pounds
R-squared: 0.87
Results:
The Gradient Boosting Machine demonstrated superior performance, with an MAE of
13.5 pounds and an R-squared of 0.87 on the validation set.
Insights:
The model emphasizes the significance of the previous month's production in
predicting the current month.
Seasonal patterns, especially during summer months, strongly influence milk