Understanding The Dataset : Attributes Name Explanation Type

1007 Words5 Pages
Understanding the Dataset: Attributes Name Explanation Type CRIM Per capita crime rate by town Numeric ZN Proportion of residential land zoned for lots over 25,000 sq.ft Numeric INDUS Proportion of non-retail business acres per town Numeric CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) Numeric NOX Nitric oxides concentration (parts per 10 million) Numeric RM Average number of rooms per dwelling Numeric AGE Owner-occupied units built prior to 1940 Numeric DIS Weighted distances to five Boston employment centers Numeric RAD Index of accessibility to radial highways Numeric TAX Full-value property-tax rate per $10,000 Numeric PTRATIO Pupil-teacher ratio by town Numeric B 1000(Bk - 0.63)^2 where Bk is the…show more content…
So now that my data is ready to process I can move to the mining process. Data Mining Process: K means algorithm: For simple Kmeans algorithm I am using 40 as number of clusters. I tried to use different numbers with 5,10,15,20,25,30,35 but the best I can come up with is 40. That gives me the closest sum of squared error. The following is the result of the simple K means test. Kmeans: Number of Iterations: 7 Within cluster sum of squared errors: 21.64762028458153 Missing values globally replaced with mean/mode For the next process I am using the Naïve Bayesian classifier. For this I am going to convert numeric values to the nominal attributes. For this I am using the following. Filter – Unsupervised – Attribute – NumericToNominal. With this you have to make sure that you choose your class value in the object Editor. I also used merge two values filter in order to merge the values from different attributes. The following is the result of the test. === Summary === Correctly Classified Instances 489 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0.0087 Root mean squared
Open Document