Abstract— Data Mining extracts useful information about data. In other words, Data Mining extracts the knowledge or interesting information from large set of structured data that are from different sources. Data mining applications are used in a range of areas such as it is used for financial data analysis, retail and telecommunication industries, banking, health care and medicine. In health care, the data mining is mainly used for disease prediction. In data mining, there are several techniques have been developed and used for predicting the diseases that includes data preprocessing, classification, clustering, association rules and sequential patterns. This paper analyses the performance of two classification techniques such as Bayesian …show more content…
The medical data processing has the high potential in medical domain for extracting the hidden patterns within the dataset [15]. These patterns are used for clinical diagnosis and prognosis. The medical data are generally distributed, heterogeneous and voluminous in nature. An important problem in medical analysis is to achieve the correct diagnosis of certain important information. This paper describes classification algorithms and it is used to analyze the performance of these algorithms. The accuracy measures are True Positive (TP) rate, F Measure, Receiver Operating Characteristics (ROC) area and Kappa Statistics. The error measures are Mean Absolute Error (M.A.E), Root Mean Squared Error (R.M.S.E), Relative Absolute Error (R.A.E) and Relative Root Squared Error (R.R.S.E) [5]. Section 2 explains the literature review; Section 3 describes the classification algorithms. Experimental results are analyzed in section 4 and section 5 illustrates the conclusion of this paper.
II. LITERATURE REVIEW Dr. S.Vijayarani et al., [11] determined the performance of various classification techniques in data mining for predicting the heart disease from the heart disease dataset. The classification algorithms is used and tested in this work. The performance factors evaluate the efficiency of algorithms, clustering accuracy and error rate. The result illustrates LOGISTICS classification function efficiency is better than multilayer perception and sequential
These measurements include the assessment of risk factors[61], quality of care[62], diagnostic criteria[63], etc. Most of these studies used rule-based method[62, 63] to detect clearly defined and less complex (fewer expression variations) measurements, such as glucose level and body mass index. For some ambiguous and complex measurements, such as coronary artery disease and obesity status, machine learning plus external terminologies[61] are often
Logistic regression model, as a usual approach before, was used to analyze the stroke outcomes' data. Fortunately, because of its potentially more powerful high-level prediction performance, machine learning algorithms have been proposed as an alternative to analyze large-scale multivariate data. Support vector machine (SVM) is one of the most popular machine learning methods to use for recognition or classification. Support vector machine (SVM) is one of the most popular machine learning methods used for recognition or classification.
The machine-learning procedure was faster and provided greater accuracy in predicting death risk in people with serious heart disorder than the existing methods.
Abstract - The healthcare industry collects large amounts of Healthcare data, but unfortunately not all the data are mined which is required for discovering hidden patterns and effective decision making. We propose efficient genetic algorithm with the back propagation technique approach for heart disease predic-tion. This paper has analyzed prediction systems for Heart dis-ease using more number of input attributes. The System uses medical terms such as Gender, blood pressure, cholesterol like13 attributes to predict the likelihood of patient getting a Heart dis-ease.
Clinical Decision support system (CDS) is one of the approach for evidence-based information to clinician at the point of care. The CDS brings about a provision for clinical knowledge clubbed with patient specific information. This enhances patient care. In most of the cases, the rules that are used for CDSS include “If conditions then criticism”. The other includes Bayesian reasoning, infobuttons etc.
In this experiment, we found that the performance of random forests is optimal with around 70% accuracy on publicly available “Diabetes 130-US hospitals for year 1999-2008” dataset. The proposed model also indicates that information regarding diagnosis, age, race, medications, admission types, lab procedures are highly influential for readmission classification. In this work, the result is evaluated in different criteria. The best AUC and f-measure values obtained on the experiment dataset were 70.1% and 66.6% the classification of the readmitted patients from non-readmitted patients using random forests. It can be observed Random Forests is performed optimal than Naïve Bayes and C4.5.
Predictive modeling is another form of data analytics that is focused on forecasting the future medical costs. This model utilizes patients’ medical information in an effort to evaluate health risks and forecast the future of medical utilization (Ingenix, 2006). A large variety of predictive modeling procedures are available, which are created to assign a specific risk level or score to the patients (Asparouhov, 2012). Risk scores are controlled by the risk markers and are given to each patient in a specific population (Ingenix, 2006). In using any past diagnoses and other information gathered, the predictive models will calculate the individual patient costs that can then be used by healthcare providers and the insurers. Therefore, certain
Knowledge attained wth the use of data mining techniques can be used to make innovative and successful decisions that will increase the success rate of health care sector and the health of patients. In this paper, the study of classification algorithms in data mining techniques and its applications are discussed. The popular classification algorithms used in healthcare domain are explained in detail. The open source data mining tools are discussed. The applications of healthcare sector using data mining techniques are studied. With the future development of information communication technologies, data mining will attain its full potential in the discovery of knowledge hidden in the health care organizations and medical
While the study is twenty years old and the results unavailable, it highlights difficulties in informatics research that are still present today. The authors noted difficulty in obtaining data from EHRs, even if from the same systems, as institutions implement them differently. Data mining methodologies were utilized to find patterns that could formulate prediction about ARDS risk factors. Data mining is “a method in computer science that is used to discovered patterns and trends within large data sets,” (McBride and Tietze, 2016). The data mining technique utilized in this study was association rule learning which identifies relationships between variables and is necessary for predictive modeling. “Our limited human information processing capacities cannot detect the patterns and trends in massive volumes of data, but data mining software uses a variety of mathematical techniques to sift through mountains of both qualitative and quantitative data to find the repeated themes and matched variables that often escape our human brains,” (Goodwin et al, 1997). Therefore, utilizing informatics strategies, such as data mining, allows large datasets to be process in ways that will allow for development of prediction tools, such as patients at high risk for developing ARDs.
In healthcare organization data mining plays the most leading role in the research area. Data mining plays a vital role in various fields of technology. In healthcare industry we gather more information regarding patients, diseases, hospital resource, electronic patient’s records, diagnosis methods, etc., by using health care in data mining it is easy to classify or group the patients having the same disease so that it helps to treat them effectively. In this paper I have reviewed about data mining application in health care and data mining challenges in health care.
Sequential patterns are a new application that has been proposed to help doctors and diabetic patients. There is one main advantage to the sequential patterns. The article says the advantage is the transparency of the model. These patterns can be easily understood by physicians. The sequential patterns were used to tie information together and make patterns between different hospitals. Mining diabetes data is used with sequential patterns. The patterns
Many hospitals are maintaining their patient’s database online like the records related to tests suggested, their results and the prescriptions suggested. This generates huge data which could be in any form like text, numbers as well as images and videos. In fact, all this data is important in making clinical decisions. In order to handle such a large data efficiently, usage of multistage classifier has become necessary. In existing systems, all the features are tested at a time by the classifier in order to detect whether patient is suffering from that particular disease or not. The entire testing consumes lot of time if the system is testing all attributes of a patient who is not actually suffering from the disease. So in such cases if we test the attributes step by step with few attributes in each step then we will be able to arrive at conclusion in primary stage itself leading to more efficient use of time as well as money. Simultaneously patient is also relieved from unnecessary stress as well as fatigue. This system optimizes resources very efficiently. Also it aims to identify the problem in the very preliminary stage and suggest reliable solution to problems increasing life of
Chronic kidney disease refers to the kidneys have been damaged by conditions, such as diabetes, glomerulonephritis or high blood pressure. It also creates more possible to mature heart and blood vessel disease. These problems may happen gently, over a long period of time, often without any symptoms. It may ultimately lead to kidney failure requiring dialysis or a kidney transplant to preserve survival time. So the early detection and treatment can prevent or deferral of these complications. One of the main tasks is giving proper treatment and accurate diagnosis of the disease. The major problem is finding an accurate algorithm which doesn’t require long time to run for
In 2017, R. Cheruku et. al. [52] proposed in their paper “SM-RuleMiner : Spider monkey based rule miner using novel fitness function for diabetes classification” that SMO can be used to outline an effective rule miner called SM-RuleMiner for diabetes diagnosis. Fitness function was also designed for SM-RuleMiner. On its comparison with other meta-heuristic-based rule mining algorithms, it was found that SM-RuleMiner achieved the best ranking in average sensitivity and the second best ranking in average classification accuracy.
In this system, the performance of CBR Algorithm will be boosted based on MapReduce approach and to detect diabetes of a particular patient with improved CBR algorithm by using Apache Hadoop framework.