UNIVERSITY OF NORTH CAROLINA AT CHARLOTTE
STUDY REPORT
ON
Data Mining
Submitted By Submitted To
Harshil Sheth DR. Anthony B. Wilkinson
800833329
Submitted in the partial fulfilment of Master’s Degree in Computer Science Contents
1. Abstract
2. Need for Data Mining
3. History of Data Mining
4. DATA MINING PROCESS
5. Applications of Data Mining
6. Privacy Concerns and Ethics
7. Precaution to be taken before using the data
8. References
Abstract
Data mining is the analysis step of the "Knowledge Discovery in Databases" process, or KDD. An interdisciplinary subfield of computer science, Data Mining is the computational process of discovering patterns in large data sets involving methods
…show more content…
This usually involves using database techniques such as spatial indices. These patterns can then be seen as a kind of summary of the input data, and may be used in further analysis or, for example, in machine learning and predictive analytics.
From a practical perspective, Data Mining automates the whole process of categorizing and discovering new understandable relationship by using advanced tools and utilizing some basic understanding of statistics, machine learning and database systems. The useful accurate information we acquire after applying this process is reusable and utilized to take important steps towards increased revenue, reduced costs in retail, financial, communication, and marketing business organization. The wide range of applicability in heterogeneous domains which comprises of large volume of rich data makes Data Mining an important and challenging sector for the Data scientists.
Why do we need Data Mining?
• We are in an age often referred to as the information age. In this information age, we believe that information leads to power and success. We have been collecting tremendous amounts of data.
• Initially, with the advent of computers and means for mass digital storage, we started collecting and storing all sorts of data, counting on the power of computers to help sort through this amalgam of information. Unfortunately, these massive collections of data stored on disparate structures very rapidly
To begin with, Dell software an information technology enterprises describes Data Mining as “an analytic process designed to explore data in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the
Data mining is another concept closely associated with large databases such as clinical data repositories and data warehouses. However data mining like several other IT concepts means different things to different people. Health care application vendors may use the term data mining when referring to the user interface of the data warehouse or data repository. They may refer to the ability to drill down into data as data mining for example. However more precisely used data mining refers to a sophisticated analysis tool that automatically dis covers patterns among data in a data store. Data mining is an advanced form of decision support. Unlike passive query tools the data mining analysis tool does not require the user to pose individual specific questions to the database. Instead this tool is programmed to look for and extract patterns, trends and rules. True data mining is currently used in the business community for market ing and predictive analysis (Stair & Reynolds, 2012). This analytical data mining is however not currently widespread in the health care community.
Having data is not valuable but using data is. Analytic insights are changing the way corporates strategize and also redefining customer expectations. Analytics is the new differentiator between success and failure in the cut throat e-commerce and internet services based industry. The huge proportions of data generated from the increasing number of smart phones, the social networks and the ever more penetrating internet are automating customer centric marketing and other services. The idea is to predict what a customer may want to buy even before the customer realizes what they need. The techniques to achieve these results are broadly classified as Predictive Analytics.
Data Mining is an analytical process that primarily involves searching through vast amounts of data to spot useful, but initially undiscovered, patterns. The data mining process typically involves three major stepsexploration, model building and validation and finally, deployment.
As stated above, data mining is often used to solve business decision problems, “it provides ways to quantitatively measure what business users should already know qualitatively” (Linoff, 2004). A growing number of industries are using data mining to become more competitive in their market by primarily focusing on the customers; increasing their customer relationships and increasing customer acquisition.
Data mining is a class of database applications that looks for hidden patterns in a group of data that can be
DATA MINING: means searching and analyzing large masses of data to discover patterns and develop new information.
Today with the ever growing use of computers in the world, information is constantly moving from one place to another. What is this information, who is it about, and who is using it will be discussed in the following paper. The collecting, interpreting, and determination of use of this information has come to be known as data mining. This term known as data mining has been around only for a short time but the actual collection of data has been happening for centuries. The following paragraph will give a brief description of this history of data collection.
Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.
In its infancy, data mining was as limited as the hardware being used. Large amounts of data were difficult to analyze because the hardware simply could not handle it [1]. The term "data mining" first began appearing in the 1980 's largely within the research and computer science communities. In the 1990 's it was considered a subset of a process called Knowledge Discovery in Databases of KKD [1]. KKD analyzes data in the search for patterns that may not normally be recognized with the naked eye. Today however, data mining does not limit itself to databases,
However, after extracting the information from a large database, the data are analyzed and summarized into useful information. This process of analyzing and summarizing the extracted data is known as Data Mining (Maimom & Rokach, 2007). In fact, data mining is one of the important steps of KDD process that infer algorithms, explore data, develop model, and discover previous patterns (Maimom & Rokach). Hence, due to the accessibility and abundance of data, knowledge discovery and data mining have become considerably important in the healthcare industry (Maimom & Rokach).
In this paper, it will figure the benefits of data mining to the businesses when employing on predictive analytics to understand the behavior of customers, association finding into products sold to customers, web mining to find business knowledge from Web customers, and clustering to find related customer information. It will assess the reliability of the data mining algorithms, and to decide if they can be trusted and predict the errors they are likely to produce. It will analyze privacy concerns raised by the collection of personal data for mining purposes. It will give at least three examples where businesses have used prognostic analysis to gain a competitive advantage and check the effectiveness of each business strategy.
Data mining or Knowledge Discovery in Databases (KDD) is discovering patterns from large data groups through methods of artificial intelligence, machine learning ,statistics, and database systems. The aim of data mining process is to extract information from a data group and switch it to an ideal format for future . The data mining process comprise of database and data management aspects, data preprocessing, inference, complexity of discovered structures, and updating.
Additionally, social networking website Facebook, stores approximately 40 billion photos in total. (“Data, data everywhere”, 2010) Besides enormous data that generated from daily operational company transactions and social networks, the price drop of the data storage is also a strong factor triggering the fever of “Big Data”. For example, Google Drive - a cloud based data storage service – had a price drop of approximately 80% from March 2014. This price drop is considered a marketing approach to attract more computer users to adopt Google’s cloud service, which provides a more convenient and efficient way to access and store daily-used files. Although emerge of enormous data provides us opportunities to conduct further investigation and benchmarking, valuable information are not fully extracted and the potential power of using “Big Data” is undermined. In order to achieve thoroughly extraction of useful information from databases, many professionals in the academic field devoted into the study of data analysis and identified two of the most important drawbacks of traditional data analysis, which lacks of predictability and is less flexible in scalability.
Data itself is useless, until it is mined and transformed into a valuable source of knowledge discovery. Due to its conversion into useful information, data mining has become the leading source being used in many fields worldwide. “Data mining is based on complex algorithms that allow for the segmentation of data to identify patterns and trends, detect anomalies, and predict the probability of various situational outcomes.”[1] Many organizations from healthcare to multimedia and more are relaying on data and getting developed through the use of it. Regardless of how, data warehouse changed its rhythm and dimension in terms of measurements such as: variety, volume and velocity. Today, one can see the current trends of data mining in different fields such as social networks, healthcare and businesses. As data mining is giving the opportunity for those fields to get advanced, "Big Data" is also opening up new doors within itself as the new trends emerge.