preview

Literature Review On Seven Data Mining Techniques Essay

Decent Essays

1.3 Literature Review on Seven Data Mining Techniques 1.3.1 K-Nearest Neighbor Classifiers (KNN) Given a positive number K and an unknown sample, a KNN classifier searches the K closest observations in training set to the unknown sample. It then classifies the unknown sample into the class with the smallest distance. The advantage of KNN is that it does not need to estimate the relationship between the response and the predictors (Shmueli, et al. 2016), while this method is dramatically affected by the number of Nearest Neighbors (James, et al. 2013). 1.3.2 Logistic Regression (LR) LR shares a similar idea with linear regression except its response is a categorical variable. It estimates the probability that an unknown observation in the training set belongs to one of the classes (James, et al. 2013). The major disadvantage of LR is that it poorly deals with the model that exists multicollinearity issue (Shmueli, et al. 2016). However, it provides a straightforward classification with probability. 1.3.3 Classification Trees (CT) CT estimate a probability for each class in each node to classify a qualitative response. It does not require any variable subset selections and variable transformation. But the tree structure has an inherent weakness which is that it is unstable and is highly affected by a small change in the data (Shmueli, et al. 2016). 1.3.4 Random Forests (RF) RF first draws multiple random samples with replacement from the training

Get Access