# Understanding The Dataset : Attributes Name Explanation Type

1007 Words5 Pages
Understanding the Dataset:
Attributes Name Explanation Type
CRIM Per capita crime rate by town Numeric
ZN Proportion of residential land zoned for lots over 25,000 sq.ft Numeric
INDUS Proportion of non-retail business acres per town Numeric
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) Numeric
NOX Nitric oxides concentration (parts per 10 million) Numeric
RM Average number of rooms per dwelling Numeric
AGE Owner-occupied units built prior to 1940 Numeric
DIS Weighted distances to five Boston employment centers Numeric
TAX Full-value property-tax rate per \$10,000 Numeric
PTRATIO Pupil-teacher ratio by town Numeric
B 1000(Bk - 0.63)^2 where Bk is the
So now that my data is ready to process I can move to the mining process. Data Mining Process:
K means algorithm:
For simple Kmeans algorithm I am using 40 as number of clusters. I tried to use different numbers with 5,10,15,20,25,30,35 but the best I can come up with is 40. That gives me the closest sum of squared error. The following is the result of the simple K means test.

Kmeans:
Number of Iterations: 7
Within cluster sum of squared errors: 21.64762028458153
Missing values globally replaced with mean/mode

For the next process I am using the Naïve Bayesian classifier. For this I am going to convert numeric values to the nominal attributes. For this I am using the following. Filter – Unsupervised – Attribute – NumericToNominal. With this you have to make sure that you choose your class value in the object Editor. I also used merge two values filter in order to merge the values from different attributes. The following is the result of the test.
=== Summary ===
Correctly Classified Instances 489 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute error 0.0087
Root mean squared