Data Mining Information

4566 Words Oct 24th, 2014 19 Pages
1 Define data mining. Why are there many different names and definitions for data mining?
Data mining is the process through which previously unknown patterns in data were discovered. Another definition would be “a process that uses statistical, mathematical, artificial intelligence, and machine learning techniques to extract and identify useful information and subsequent knowledge from large databases.” This includes most types of automated data analysis. A third definition: Data mining is the process of finding mathematical patterns from (usually) large sets of data; these can be rules, affinities, correlations, trends, or prediction models.

Data mining has many definitions because it’s been stretched beyond those limits by some
…show more content…
Whereas prediction is largely experience and opinion based, forecasting is data and model based. That is, in order of increasing reliability, one might list the relevant terms as guessing, predicting, and forecasting, respectively. In data mining terminology, prediction and forecasting are used synonymously, and the term prediction is used as the common representation of the act.
Classification: analyzing the historical behavior of groups of entities with similar characteristics, to predict the future behavior of a new entity from its similarity to those groups
Clustering: finding groups of entities with similar characteristics
Association: establishing relationships among items that occur together
Sequence discovery: finding time-based associations
Visualization: presenting results obtained through one or more of the other methods
Regression: a statistical estimation technique based on fitting a curve defined by a mathematical equation of known type but unknown parameters to existing data
Forecasting: estimating a future data value based on past data values.

Section 4.2 Review Questions
1 What are the major application areas for data mining?
Applications are listed near the beginning of this section (pp. 145-147): CRM, banking, retailing and logistics, manufacturing and production, brokerage, insurance, computer hardware and software, government, travel, healthcare, medicine, entertainment, homeland security, and sports.
Identify at least five
Open Document