Selecting and creating data set from sources like data warehouse, transactional data or flat tables. This step is considered crucial because it is considered as base for constructing models. The entire study may fail if any of the important attributes are missed. After getting started with the best available data set, the techniques of knowledge discovery and modelling are applied vigorously.
Pre-processing and cleansing: Data is made reliable during this stage. Include mechanisms such as removing outliers, handling missing values.
Data Transformation: Generating better data for which the data mining is prepared and developed. Dimension reduction, extraction, selection, record sampling and so forth are used.
Data mining: Choosing appropriate data mining algorithm is the main task in this step. After getting decided with what kind of data to be used among classification, regression, clustering etc we need to consider two important steps like: Prediction and description.
Prediction is often referred to as supervised data mining and description is referred to as unsupervised data mining.
In data mining, inductive learning techniques are used when constructing a model which ensures that trained data model can be applied for future cases.
Select a data mining algorithm considering all the factors. There might be a necessity to employ these algorithm several times until the satisfying result is obtained.
Evaluation: With respect to the goals, we interpret and evaluate the
Data Mining. It is the process of discovering interesting knowledge that are gathered and significant structures from large amounts of data stored in data warehouse or other information storage.
Data mining software allows users to analyze large databases to solve business decision problems. Data mining is, in some ways, an extension of statistics, with a few
As stated above, data mining is often used to solve business decision problems, “it provides ways to quantitatively measure what business users should already know qualitatively” (Linoff, 2004). A growing number of industries are using data mining to become more competitive in their market by primarily focusing on the customers; increasing their customer relationships and increasing customer acquisition.
What is data mining? Data mining is the deriving new information from massive amounts of data in databases (Sauter, 2014, p. 148). Chowdhurry argues that data mining is part of KDD. KDD is knowledge discovery in databases, it is a process that includes data mining. In addition to data mining, KDD includes data preparation, modeling and evaluation of KDD. KDD is at the heart of this research field. This research field is multidisciplinary and includes data visualization, machine learning, database technology, expert systems and statistics. Overall, the use of a case based reasoning and data mining tools within an information system would create a CBR system to solve new problems with adapted solutions and could be used in many industries such as education and healthcare (Chowdhurry,
An example of how data mining is conducted and used to benefit business can be explained in the following scenario:
However, after extracting the information from a large database, the data are analyzed and summarized into useful information. This process of analyzing and summarizing the extracted data is known as Data Mining (Maimom & Rokach, 2007). In fact, data mining is one of the important steps of KDD process that infer algorithms, explore data, develop model, and discover previous patterns (Maimom & Rokach). Hence, due to the accessibility and abundance of data, knowledge discovery and data mining have become considerably important in the healthcare industry (Maimom & Rokach).
Data mining allows companies to focus on the more important information in their data warehouses. Data mining can be broken down into two major categories. Automated prediction of trends and behaviors, and automated discovery of previously unknown patterns. In the first category, data mining automates the process of finding predictive information in large databases. Questions that traditionally required exhaustive hands-on analysis can now be quickly answered directly from data. In the second category, data mining tools sweep through databases and identify previously hidden patterns in one step. This category is where the major focus of research has been on.
Data Mining is generally used for four main tasks: (1) to improve the process of making new customers and retaining customers; (2) to reduce fraud; (3) to identify internal wastefulness and deal with that wastefulness in operations, and (4) to chart unexplored areas of the internet
In the year of 2001 when the use of data mining in marketing was a relatively new concept Shaw ,Subramaniam, Tan and Welge gave an insight about management of large database using data mining techniques. They brought the concept of identifying useful information from the large customer database by identifying hidden patterns. They integrated data mining and marketing knowledge management to help in managing marketing decisions.
Data mining is a new technology which could be used in extracting valuable information from data warehouses and databases of companies and governments. It involves the extraction of hidden information from some raw data. It helps in detecting inconsistency in data and predicting future patterns and attitude in a highly proficient way. Data mining is implemented using various algorithm and framework, and the automated analysis provided by this algorithm and framework go ahead of evaluation in dataset to providing solid evidences that human experts would not have been able to detect due to the fact that they
Here integrated data based on specific criteria (e.g. sales, profits) is used by data mining tool which further evaluates the data based on the defined criteria, e.g. compares the characteristics of one customer with another, which leads to the determination of a customer segment and henceforth provides the basis for a targeted marketing campaigns (Alt & Puschmann, 2004).
The Data mining it also be known as that the way of picking the data and from big mix of Information from the cloud. And it can also be say’s like it’s a data mining is digging or extracting knowledge from the data.
The text book Data Warehousing concepts, techniques, products and applications by C.S.R. Prabhu. Mainly, the text book gives the information about the data model, online analytical processing systems and tools, data warehouse architecture, data mining algorithms, organizational issues of the data warehouse, data warehouse segmentation, Application of data mining and data warehousing. Firstly, the book describes Data Warehouse is a system where it is used for reporting the data from the wide range of the sources and indeed it helps the company to guide the management decisions. Moreover, Data Warehousing is the process where it evolved with the transformation and extraction of data from the various applications. Identically, it also has a technique from the formulation of the business intelligence where it gives effective implementation which makes the Data warehouse the effective technology for the business use. Importantly, Data Warehouse is the division of data into the individual data component. Similarly, Data Warehouse helps to analyze the data and whereas they are technologies which helps to analyze the data available in the data warehouse. Indeed, the functions of the Data Warehouse tools are data extraction, data cleaning, data transformation. Mainly, the data extraction gathers the data from the multiple sources, data cleaning helps to find and correct errors in data, data transformation converts the data into data warehouse format. Consequently, data cleaning and
Data mining is the process of ‘digging-out’ patterns from data, usually through Clustering, Classification, Regression and Association rule learning. Data mining technology can generate new business opportunities by providing:
Data Mining is known as the process of analyzing data to extract interesting patterns and knowledge. Data