Data Extraction Of Knowledge From High Volume Of Data

1233 Words Aug 15th, 2015 5 Pages
Introduction:
Data mining is extraction of knowledge from high volume of data. In this data stream mining experiment, I have used “sorted.arff” dataset contains 540888 instances and 22 attributes. I have tried two single algorithms and two ensemble algorithms, tested the accidents on road for last 15 years.
Weka: Data Mining Software
Weka (“Waikato Environment for knowledge Analysis”) is a collection of algorithms and tools used for data analysis. The algorithms can be applied directly or it can be called using java code, an object oriented programming language. It contains tools for pre-processing, classification, regression, clustering, associating, select attributes and visualization on given dataset. The advantages of using WEKA software is, it is freely available and platform independent. It is simple tool and it can be used by non-specialist of data mining. For testing, it doesn’t need any programming code at all.
WEKA can identify .arff file format. It can classify the dataset present in .arff file. First open the file sorted.arff, second, test the file with few algorithms with respect to accuracy and finally predict the value of D1 factor. The screenshot 1 is the pre-processing of 22 attributes in Weka and last attribute D1 factor is analysed using algorithms.

Screenshot 1: Graphs of pre-processed data

Algorithms Considered:

There are different types of machine learning logarithms available to solve the classification problems. To carry out this experiment…
Open Document