1007 Words5 Pages

Understanding the Dataset:
Attributes Name Explanation Type
CRIM Per capita crime rate by town Numeric
ZN Proportion of residential land zoned for lots over 25,000 sq.ft Numeric
INDUS Proportion of non-retail business acres per town Numeric
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) Numeric
NOX Nitric oxides concentration (parts per 10 million) Numeric
RM Average number of rooms per dwelling Numeric
AGE Owner-occupied units built prior to 1940 Numeric
DIS Weighted distances to five Boston employment centers Numeric
RAD Index of accessibility to radial highways Numeric
TAX Full-value property-tax rate per $10,000 Numeric
PTRATIO Pupil-teacher ratio by town Numeric
B 1000(Bk - 0.63)^2 where Bk is the*…show more content…*

So now that my data is ready to process I can move to the mining process. Data Mining Process: K means algorithm: For simple Kmeans algorithm I am using 40 as number of clusters. I tried to use different numbers with 5,10,15,20,25,30,35 but the best I can come up with is 40. That gives me the closest sum of squared error. The following is the result of the simple K means test. Kmeans: Number of Iterations: 7 Within cluster sum of squared errors: 21.64762028458153 Missing values globally replaced with mean/mode For the next process I am using the Naïve Bayesian classifier. For this I am going to convert numeric values to the nominal attributes. For this I am using the following. Filter – Unsupervised – Attribute – NumericToNominal. With this you have to make sure that you choose your class value in the object Editor. I also used merge two values filter in order to merge the values from different attributes. The following is the result of the test. === Summary === Correctly Classified Instances 489 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0.0087 Root mean squared

So now that my data is ready to process I can move to the mining process. Data Mining Process: K means algorithm: For simple Kmeans algorithm I am using 40 as number of clusters. I tried to use different numbers with 5,10,15,20,25,30,35 but the best I can come up with is 40. That gives me the closest sum of squared error. The following is the result of the simple K means test. Kmeans: Number of Iterations: 7 Within cluster sum of squared errors: 21.64762028458153 Missing values globally replaced with mean/mode For the next process I am using the Naïve Bayesian classifier. For this I am going to convert numeric values to the nominal attributes. For this I am using the following. Filter – Unsupervised – Attribute – NumericToNominal. With this you have to make sure that you choose your class value in the object Editor. I also used merge two values filter in order to merge the values from different attributes. The following is the result of the test. === Summary === Correctly Classified Instances 489 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0.0087 Root mean squared

Related

## Is Data Mining A Valuable Asset? Essay

4170 Words | 17 Pagesbeen on the study and development front for quite a few years now. Data in the real world is dirty such as incomplete lacking of attributes value, lacking certain attributes of interest, or containing only aggregate data also dirty data is also called noisy data that containing errors and outliers and the data also is inconsistent that containing discrepancy in codes or names. Data mining is a new technology which could be used in extracting valuable information from data warehouses and databases of

## Managing And Securing Unstructured Data

1417 Words | 6 Pagesin Internet, there has been an enormous increase in the amount of information generated and shared in social networking sites and across various industries. This has led to storing and access issues with regards to unstructured data. Firstly understanding what unstructured data is of primary importance before trying to handle it. In simple terms unstructured data can be understood as data that can’t be stored in the form of rows and columns. It can be anything including email files, text documents

## Crisp-Dm

19407 Words | 78 Pages(SPSS), Randy Kerber (NCR), Thomas Khabaza (SPSS), Thomas Reinartz (DaimlerChrysler), Colin Shearer (SPSS) and Rüdiger Wirth (DaimlerChrysler) SPSS is a registered trademark and the other SPSS products named are trademarks of SPSS Inc. All other names are trademarks of their respective owners. © 2000 SPSS Inc. CRISPMWP-1104 This document describes the CRISP-DM process model and contains information about the CRISP-DM methodology, the CRISP-DM reference model, the CRISP-DM user guide, and the

## A Survey On Data Mining Classification Algorithms

3556 Words | 15 Pagesprocess. The goal of classification is to evaluate the input data to develop a precise. Explanation or model for each class using the features by using the present data. 2. ARCHITECTURE OF DATA MINING Data mining and knowledge discovery is the name frequently used to refer to a very interdisciplinary field, which consists of using methods of several research areas to extract knowledge from real-world datasets. There is a distinction between the terms data mining and knowledge discovery which seems

## Running Head: Metamorphic Relastion For Effective Testing

10452 Words | 42 PagesMETAMORPHIC RELASTION FOR EFFECTIVE TESTING 1 METAMORPHIC RELATION PRIORITIZATION FOR EFFECTIVE TESTING 22 Metamorphic Relation Prioritization for effective testing Student’s name Institutions name Abstract Software engineering comprehends several disciplines devoted to avert and remedy malfunctions and to warrant adequate behavior. Testing, the subject of this paper, is a widespread validation approach in industry, but it

## Financial Statements Fraud

56771 Words | 228 Pages123 iv List of Tables Table 2.1 Table 2.2 Table 2.3 Table 2.4 Table 2.5 Table 3.1 Table 3.2 Table 3.3 Table 3.4 Table 3.5 Base-Classifiers ................................................................................................... 15 Datasets................................................................................................................ 16 Experimental Variables........................................................................................ 17 Statistical Analysis

## Decision Tree Induction & Clustering Techniques in Sas Enterprise Miner, Spss Clementine, and Ibm Intelligent Miner – a Comparative Analysis

6636 Words | 27 Pagesclassification applications that target discrete value outcomes by classifying unclassified data based on a pre-classified dataset, for example, classifying credit card applicants into three classes of risk, which are low, medium or high. Also, decision trees could be used in estimation applications that have continuous outcomes by estimating value based on pre-classified datasets, and in this case the tree is called a regression tree, for example, estimating household income. Moreover, decision trees

## Analysis and Design of Software Architecture

6018 Words | 25 PagesAnalysis and Design Software Architecture (707.023) Denis Helic KMI, TU Graz Oct 19, 2011 Denis Helic (KMI, TU Graz) SA Analysis and Design Oct 19, 2011 1 / 78 Outline 1 2 3 4 5 6 7 8 Development Process Requirements Quality Attributes Runtime QA Non-runtime QA Requirements Analysis: Example Architectural Analysis & Design Architectural Views Denis Helic (KMI, TU Graz) SA Analysis and Design Oct 19, 2011 2 / 78 Development Process Methodology Diﬀerent software

## A Concise Guide to Market Research Using Spss

71933 Words | 288 Pagesits current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: WMXDesign GmbH, Heidelberg, Germany Printed on acid-free paper Springer

## Cissp Study Guide

67657 Words | 271 PagesEXCEPT the A. definition of the issue and statement of relevant terms. B. statement of roles and responsibilities C. statement of applicability and compliance requirements. D. statement of performance of characteristics and requirements. Answer: D Explanation: Policies are considered the first and highest level of documentation, from which the lower level elements of standards, procedures, and guidelines flow. This order , however, does not mean that policies are more important than the lower elements

### Is Data Mining A Valuable Asset? Essay

4170 Words | 17 Pages### Managing And Securing Unstructured Data

1417 Words | 6 Pages### Crisp-Dm

19407 Words | 78 Pages### A Survey On Data Mining Classification Algorithms

3556 Words | 15 Pages### Running Head: Metamorphic Relastion For Effective Testing

10452 Words | 42 Pages### Financial Statements Fraud

56771 Words | 228 Pages### Decision Tree Induction & Clustering Techniques in Sas Enterprise Miner, Spss Clementine, and Ibm Intelligent Miner – a Comparative Analysis

6636 Words | 27 Pages### Analysis and Design of Software Architecture

6018 Words | 25 Pages### A Concise Guide to Market Research Using Spss

71933 Words | 288 Pages### Cissp Study Guide

67657 Words | 271 Pages