Data Mining of Chemical Analysis for White Wine Quality

Wine was once viewed as a luxury good, but now it is increasingly enjoyed by a wider range of consumers. According to the different qualities, the prices of wines are quite different. So when the wine sellers buy wines from wine makers, it’s important for them to understand the wine quality, which is in some degrees affected by some chemical attributes. When wine sellers get the wine samples, it makes difference for them to accurately classify or predict the wine quality and this will differentiate their profits. So our goal is to model the wine quality based on physicochemical tests and give the reference for wine sellers to select high, moderate and low qualities of wines.
We download wine quality data set that is the white
Since our goal is to make model to give reference to the 3 categories, we can re-defined the categories into 3 other than 7 and in this way, we expected to gain more reasonable results and give wine sellers more accurate models to support their decisions to purchase wines from the wine makers.
3.1 Clustering and redefine data set
Considering that clustering’s goal is to put objects that are “similar” together in a cluster, this matches our goal to make three quality categories. So we decided to use clustering first to explore if the categories can be parted. Since the XLMiner just can run no more than 4000 records for clustering, we need to reduce our data set size to 4000. First, we eliminated the data outside of 3 stand deviation range; then we found that the quality 5 has about 1800 records which took nearly a half of all records which spread 7 qualities, so we randomly selected 80% of the quality 5 records and quality 5’s dominant effect would be reduced. In this way the new data set was determined. After the new data set was decided, we created a new output variable-new quality-and the values are 1, 2, and 3, each representing low quality, mid-range quality and high quality. The details are showed as the tables below: Wine with quality 3, 4, 5 | Wine with quality 6 | Wine with quality 7, 8, 9 | Low quality wine | Mid-range quality wine | High quality wine | 1271 observations |
