A Research Study On Data Mining

1171 Words Jun 28th, 2016 5 Pages
Data mining is the process of discovering patterns, trends, correlations from large amounts of data stored electronically in repositories, using statistical methods, mathematical formulas, and pattern recognition technologies (Sharma n.d.). The main idea is to analyze data from different perspectives and discover useful trends, patterns and associations. As discussed in the previous chapter, the healthcare organizations are producing massive amounts of electronic medical records, which are impossible to process using traditional technologies (e.g., Microsoft excel). Therefore data mining is becoming very popular in this field as it can be used to identify the presence of chronic disease, detect the cause of the disease, analyze the …show more content…
This section provides a brief overview of the data mining methods used in the existing studies to analyze and predict the presence of kidney disease using EHR.

4.1.1. Classification

Classification is the process of dividing data samples into target classes and then predicting the class for each data points. For example, a CKD patient can be classified as a normal, mild, moderate, or extreme stage based on the test results and medical history. Classification techniques can be divided into two broad categories: Binary and multilevel classification. Logistic regression, support vector machines (SVM), decision tree (DT), artificial neural network (ANN), Bayesian network and nearest neighbors (kNN) are the most widely used classification techniques in the healthcare industry (Tomar & Agarwal 2013).

Logistic Regression: Logistic regression draws a separating line among the classes using the training data and then use it to classify the unknown data points. This technique generally suffers from high bias and low variance. However, there are different types of logistic regression (e.g. lasso regression, ridge regression) based on the balance between bias and variance.

Naïve Bayes: Another supervised learning method is Naive Bayes (NB) which is a simple probabilistic classifier established on Bayes theorem. NB needs a very small amount of data for training and it can be
Open Document