Abstract Data Mining is the non-trivial extraction of potentially useful information about data. In other words, Data Mining extracts the knowledge or interesting information from large set of structured data that are from different sources. There are various research domains in data mining specifically text mining, web mining, image mining, sequence mining, process mining, graph mining, etc. Data mining applications are used in a range of areas such as it is used for financial data analysis, retail and telecommunication industries, banking, health care and medicine. In health care, the data mining is mainly used for disease prediction. In data mining, there are several techniques have been developed and used for predicting the diseases …show more content…
Compared with other data mining application fields, medical data mining plays an important role and it has some unique characteristics. The medical data processing has the high potential in medical domain for extracting the hidden patterns within the dataset [15]. These patterns are used for clinical diagnosis and prognosis. The medical data are generally distributed, heterogeneous and voluminous in nature. The data ought to be integrated and collected to provide a user oriented approach to hidden patterns of data. An important problem in medical analysis is to achieve the correct diagnosis of certain important information. This paper describes classification algorithms and it is used to analyze the performance of these algorithms. The performance factor is used for analysis are accuracy and error measures. The accuracy measures are True Positive (TP) rate, F Measure, Receiver Operating Characteristics (ROC) area and Kappa Statistics. The error measures are Mean Absolute Error (M.A.E), Root Mean Squared Error (R.M.S.E), Relative Absolute Error (R.A.E) and Relative Root Squared Error (R.R.S.E) [5]. Section 2 describes the classification algorithms. Experimental results are analyzed in section 3 and section 4 describes the conclusion of this paper.
II. Research Methodology
Document classification is one of the important issues in the field of data mining, where the documents are classified
Data mining is the process of ‘digging-out’ patterns from data, usually through Clustering, Classification, Regression and Association rule learning. Data mining technology can generate new business opportunities by providing:
Our research is to apply DM on a given data set extracted from data held in RMIS at JKUAT. The literature review on the methodology used is presented in this chapter under Section 2.4. Before this we have the definition of terms in DM given in section 2.2 defining data mining, concept of knowledge
Owing to rapid developments in digital technologies, the use of electronic media to capture, process and accumulate the information is witnessing an extraordinary development [1]. The stored information is reaching zeta-bytes [2], whereas our capability to scrutinize such large amount of data lags far behind the growth. One of the impediments is the high dimensionality of the datasets. This includes information in different application areas, such as in electronic health records (EHRs), biology, astronomy, medical imaging, video archiving, and web data. Different data mining techniques have been used to extract knowledge available in some of these data sets, albeit with limited success [3].
Data mining generally is the process of analysing data from different perspectives and summarising it into useful information (Thuraisingham, 1999). It is also called the “Knowledge Discovery in Databases” process. It can be understand in the way of discovering interesting and useful patterns and relationships in large volumes of data. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for future use. (Han & Kamber, 2006)
[1] Jaiwei Han and Micheline Kamber “Data Mining: Concepts and Techniques”,Morgan Kaufmann Publications Second Edition,2006
it is the most effective data mining technique to discover hidden or desired pattern among the large amount of data. It is responsible to find correlation relationships among different data attributes in a large set of items in a database .
Background - One of the most promising developments in the field of computing and computer memory over the past few decades has been the ability to bring tremendous complex and large data sets into database management that are both affordable and workable for many organizations. Improvement in computer power has also allowed for the field of artificial intelligence to evolve which also improves the sifting of massive amounts of information for appropriate use in business, military, governmental, and academic venues. Essentially, data mining is taking as much information as possible for a variety of databases, sifting it intelligently and coming up with usable information that will help with data prediction, customer service, what if scenarios, and extrapolating trends for population groups (Ye, 2003; Therling, 2009).
Data mining is the process of extracting knowledge from large data sets. It uses artificial intelligence methods to discover the hidden relationships among the huge amount of data that is collected. It has a great potential to improve applications in many fields like Healthcare systems, Customer relationship management, Financial banking, Research analysis, Bio informatics, Marketing analysis, Education, Manufacturing engineering, Criminology and many more. Criminology is the study of crimes and typically a criminologist’s job include analyzing data to determine why the crime was committed and more importantly to predict and prevent criminal behavior in the future. It became an interesting field to apply data mining techniques because of its large datasets and the complexity of relationships between the data. This paper will discuss some of the tools and techniques used in this field to find out important information that will help and support the police forces and reduce social nuisance.
Data mining has become astronomically paramount for most of the business domains among them few of them are listed like marketing, financing and telecommunication. This has become possible because of the development of data base technology and systems in recent past few years. Data mining strategies is utilized for data processing. Operations performed on the data such as accumulation, utilize or administration is called data processing. A few real life demonstrations that can further demonstrate with the help of an example, a shop keeper requesting that client to fill in a counter slip for data process and to maintain the record for future [5]. Affiliation rule mining is a data mining strategy that can undoubtedly finds the patterns or association in astronomically quantity of facts units. So as for statistics to be valuable, it should have the following characteristics: Precise, Consummate, Malleable, Dependable, Pertinent, Simple, Timely Retrievable, and Verifiable.
Data mining is often referenced to text mining. This concept is constructed of researching information, storing information, and presents information. This concept is mainly about the searching of patterns new or old. Along with discovering patterns it is about predicting them as well. Text Mining can be referred as text data mining. This concept involves controlled patterns of data involvement. This is a concept focused on the basic analytical techniques. This term has many different name, but is still defined as using a wide variety of countless data bases to retrieve information. The retrievals no matter where they are collected from are valuable to a business as long as they are from a reliable truthful source.
In our report we will discuss the capabilities of data mining techniques in the context of education and how it is used to evaluate student
Abstract— Big Data is a new term used to describe a massive volume of both structured and unstructured data that is so large that it is difficult to process using traditional database and software techniques. Big Data mining is the capability of extracting useful information from these large datasets or streams of data, that due to its volume, variability, and velocity, it was not possible before to do it. The Big Data challenge is becoming one of the most exciting opportunities for the next years. We present in this issue, a broad overview of the topic, its current status, and forecast to the future. We also introduce some articles, written by influential scientists in the field, covering the most interesting and state-of-the-art topics on Big Data mining.
This research paper highlight the importance and need of data mining in the age of electronic media where large amount of information and consolidated database is readily available. This seemingly useless information can unearth some mind-blowing statistics and predict the future trends with relative ease through use of data mining techniques which can benefit the businesses, start-ups, country and individual alike. However, since data mining is effective in bringing out patterns, correlation and association through complex algorithms and analysis, it has, over the past few decades proved to be a useful tool in cyber or internet security.
The medical data processing has the high potential in medical domain for extracting the hidden patterns within the dataset [15]. These patterns are used for clinical diagnosis and prognosis. The medical data are generally distributed, heterogeneous and voluminous in nature. An important problem in medical analysis is to achieve the correct diagnosis of certain important information. This paper describes classification algorithms and it is used to analyze the performance of these algorithms. The accuracy measures are True Positive (TP) rate, F Measure, Receiver Operating Characteristics (ROC) area and Kappa Statistics. The error measures are Mean Absolute Error (M.A.E), Root Mean Squared Error (R.M.S.E), Relative Absolute Error (R.A.E) and Relative Root Squared Error (R.R.S.E) [5]. Section 2 explains the literature review; Section 3 describes the classification algorithms. Experimental results are analyzed in section 4 and section 5 illustrates the conclusion of this paper.
Based on these trends, large amount of data are being gathered and stored in databases, and data warehouses. The huge volume and fast pace made the power of data much stronger than what we expected, with lots of potential waiting us to maintain, explore and make decisions about. Using the efficient way to analyze the most helpful and valuable data, as well as to find out the hidden data is becoming urgent and important. Because of these needs, data mining started to be used as a helpful technology, and plays an important role under today’s studying and working environment.