A Research Study On Data Mining

3162 Words Feb 12th, 2015 13 Pages
Abstract Data Mining is the non-trivial extraction of potentially useful information about data. In other words, Data Mining extracts the knowledge or interesting information from large set of structured data that are from different sources. There are various research domains in data mining specifically text mining, web mining, image mining, sequence mining, process mining, graph mining, etc. Data mining applications are used in a range of areas such as it is used for financial data analysis, retail and telecommunication industries, banking, health care and medicine. In health care, the data mining is mainly used for disease prediction. In data mining, there are several techniques have been developed and used for predicting the diseases …show more content…
An important problem in medical analysis is to achieve the correct diagnosis of certain important information. This paper describes classification algorithms and it is used to analyze the performance of these algorithms. The performance factor is used for analysis are accuracy and error measures. The accuracy measures are True Positive (TP) rate, F Measure, Receiver Operating Characteristics (ROC) area and Kappa Statistics. The error measures are Mean Absolute Error (M.A.E), Root Mean Squared Error (R.M.S.E), Relative Absolute Error (R.A.E) and Relative Root Squared Error (R.R.S.E) [5]. Section 2 describes the classification algorithms. Experimental results are analyzed in section 3 and section 4 describes the conclusion of this paper.
II. Research Methodology
Document classification is one of the important issues in the field of data mining, where the documents are classified with supervised knowledge [1]. The main objective of this paper is to find the best classification algorithm among Bayesian and Lazy classifiers. Figure 1 shows the architecture of classification algorithm as follows, Figure 1.Architecture of Classification algorithm
A. Dataset In order to compare the data mining classification techniques, hepatitis data can be collected from the UCI repository. This dataset has 156 instances and 20 attributes. Weka (Waikato Environment for Knowledge Analysis) tool is used for analyzing the performance of the classification algorithms.
B. Data Preprocessing
Open Document