To predict the links in the dataset we have used the Fuzzy Link Based Classification algorithm- a subpart of Neuro Fuzzy Link Based Classification algorithm, which is a combination on the Feedforward Neural Networks (FFNet) Backpropagation techniques and fuzzy logic. FFNet was inspired from the neural system the human body. In this chapter we will first explain about the system design involved in setting up the network followed by explaining the FFNet and Backpropagation algorithm, explain the reasons for using the algorithm and then discuss about how we worked on our dataset and the steps involved in obtaining the desired output.
5.1 System Design
From selecting the dataset to be worked on to data classification and link prediction, many steps are involved in like clustering data, classification of data, data extraction etc. These steps are performed in a proper order as mentioned in the below figure of System architecture.
Fig 5.1 System Architecture [9]
• Initially the dataset must be selected to retrieve the data so that classification can be performed.
• In user interface model data is retrieved from the dataset and represented in a readable format. On this data pattern recognition and analysis can be performed.
• In clustering and classification phase dissimilar data is differentiated from similar data. In our project this step can be omitted because the dataset is already in the form of a CSV file, divided according to the respective link types.
•
As the information system is designed, implementation decisions describing how the system will work are made. Data flow diagrams offer those implementation details, including data stores that refer to files and database tables, programs and human actions that perform processes. The automated parts of the system are differentiated from the manual parts by the human-machine boundary (Dennis, Wixom,
24) Before it can be loaded into the data warehouse, operational data must be extracted and
Worked closely with BAs to gather and provide specific and crucial information through data mining using advanced SQL queries, OLAP functions and PL/SQL Stored Procedures and functions.
Data mining simply explained is automated sorting and analyzing of large amounts of data, searching for patterns and correlations. On a
While the data set is being assembled, I will begin designing the initial architecture for the
Throughout extraction, the desired data is identified and extracted from the source system and is made available for additional processing. The data can be extracted from numerous different sources. In most cases, the data sources are internal however sometimes they are external. The ultimate aim is to retrieve all the essential data from the source system with as little resources as possible. The size of the data extracted can range from kilobytes to gigabytes.
With the advent of machine learning and its potential in getting best out of any application, even the data mining played the game of harnessing the power of machine learning. Needless to say, SVM is one of the very powerful and revolutionary algorithms in the field of machine learning due to its efficiency in classifying. In this report, my concentration mostly lies in discussing the applications of SVM in Data mining and analyzing the performance. Data mining is very important and essential technique in the field of analytics. The principle being extracting use full information from a massive data source and using it as an input for improvement or development. When we have a huge amount of data and equally less amount of information, data mining is one technique that enables to get better information out of the data. However, it 's not very easy to do the analysis part on huge datasets, and hence machine intelligence is introduced into the field of data mining.
Processing of data into information: The objective of this stage is to take the data from the first stage and then transform that into specific metrics and useful information by finding the rations performing other calculations. Here all the data that is gathered in the first stage will be transformed into use full information. Some of the examples are time on page, Bounce rate and unique visitors etc.
Clustering is a fundamental approach in data mining and its aim is to organize data into distinct groups to identify intrinsic hidden patterns of data. In other words, clustering methods divide a set of instances into several groups without any prior knowledge using the similarity of objects in which patterns in the same group have more similarities to each other than patterns in different groups. It has been successfully applied in various fields such as image processing (Wu & Leahy, 1993) cybersecurity (Kozma, Rosa, & Piazentin, 2013), pattern recognition (Haghtalab, Xanthopoulos, & Madani, 2015), bioinformatics(C. Xu & Su, 2015), protein analysis (de Andrades, Dorn, Farenzena, & Lamb, 2013), microarray analysis (Castellanos-Garzón,
Once the proposed project is understood and it is agreed upon that the system requirements will be supported, a solid foundation must be built to support the development of the system. Models and other documentation are used to aid in the visualization and description of the proposed system. Process models are used to identify and document the portion of system requirements that relates to data. Processes are the logical rules that are applied to transform the data into meaningful information. The three main tools used in process modeling are data flow diagrams, which shows how data moves through an information system; a data dictionary, which is a central storehouse of information about the system’s data used by analysts to collect,
The data configuration is the connection between the data framework and the client. It includes creating a particular methodology for information planning and those strides are important to put exchange information into a usable structure for inspecting so as to handle can be accomplished the PC to peruse information from a composed or printed report or it can happen by having individuals entering the information specifically into the framework. The outline of data spotlights on controlling the measure of info required, controlling the blunders, maintaining a strategic distance from deferral, evading additional steps and keeping the procedure basic. The data is planned in such a path along these lines, to the point that it gives security
The backend of the system requires software and hardware to manipulate the data once it collected. While there are many software applications on the market, unless you are part of a company’s Information Systems team, you will not come into contact with this part of the system. Furthermore, there are too many options to cover in this presentation.
This section provides succinct and coherent information about the important techniques and concepts that are related to this project such as what is an expert system and what is the ID3 algorithm which is used as the data mining technique. The following subsections are an overview for each of these concepts:
Since the invention of computers, they are used to process raw data, analyse and display converted data into useful information to the end users. Entering the information about an item requires repeated data entry which can be burdensome, costly and error-prone. Because of this, many
This architecture is composed of batch layer, speed layer and serving layer. The batch layer computes views on the collected data. This collected data is processed multiple times. The speed layer processes on the real input data. Most recent data are processed by this layer. Once the data is processed from the speed layer to the batch layer the information generated is stored. The drawback of this architecture is that each layer is a separate entity. The speed layer only deals with the recent data while the batch layer computes the historic data. Thus, the data in the serving layer is replaced by the recent data and views computed by the batch layer, reducing the size of the data. As a result, this architecture is suitable for small datasets. This architecture also depends on heavy use of memory and distributed computing technologies which reduces the processing efficiency of the real-time data.