Data Mining

Better Essays

Chapter 1 Exercises

1. What is data mining? In your answer, address the following:
Data mining refers to the process or method that extracts or \mines" interesting knowledge or patterns from large amounts of data.
(a) Is it another hype?
Data mining is not another hype. Instead, the need for data mining has arisen due to the wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge. Thus, data mining can be viewed as the result of the natural evolution of information technology.
(b) Is it a simple transformation or application of technology developed from databases, statistics, machine learning, and pattern recognition?
No. Data mining is more than a simple transformation …show more content…

The resulting description could be a general comparative profile of the students such as 75% of the students with high GPA's are fourth-year computing science students while 65% of the students with low GPA's are not. * Association is the discovery of association rules showing attribute-value conditions that occur frequently together in a given set of data. For example, a data mining system may find association rules like major(X; “computing science”) → owns(X; “personal computer”) [support = 12%; confidence = 98%] where X is a variable representing a student. The rule indicates that of the students under study, 12% (support) major in computing science and own a personal computer. There is a 98% probability (confidence, or certainty) that a student in this group owns a personal computer. * Classification differs from prediction in that the former constructs a set of models (or functions) that describe and distinguish data classes or concepts, whereas the latter builds a model to predict some missing or unavailable, and often numerical, data values. Their similarity is that they are both tools for prediction: Classification is used for predicting the class label of data objects and prediction is typically used for predicting missing numerical data values. * Clustering analyzes data objects without consulting a known class label. The objects are clustered or grouped based on the principle of

Get Access