Abstract— Genetic Algorithm (GA) is a stochastic randomized blind search and optimization technique based on evolutionary computing that has already been proved to be robust and effective from its outcome in solving problems from variety of application domains. Clustering is a vital technique to extract meaningful and hidden information from the datasets. Clustering techniques have a broad field of application including bioinformatics, image processing and data mining. In order to the find the close association between the densities of data points, in the given dataset of pixels of an image, clustering provides an easy analysis and proper validation. In this paper, we propose an evolutionary computing based approach for unsupervised image clustering using elitist GA (EGA) – a efficient variant of GA that segments an image into its constituent parts automatically. The aim of this algorithm is to produce precise segmentation of images using intensity information along with their neighbourhood relationships. Experimental results from simulation study reveal that the algorithm generates good quality segmented image. Keywords— Image Clustering, Evolutionary Computing (EC), Genetic Algorithm (GA), Elitism, Image Segmentation I. INTRODUCTION Clustering is practicable in various explorative pattern-analysis, grouping, decision-making, and machine learning circumstances, including data mining, document retrieval, image segmentation, and pattern classification [1]. Clustering a set of
(2011) presented research of a clustering issue for office workplaces using a fuzzy bat algorithm. Khan and Sahari (2012a) as well presented a comparison study of bat algorithm with PSO, GA, along with other algorithms in the perspective for e-learning, and thus recommended that bat algorithm has clearly some advantages over other algorithms. Then, they (Khan and Sahari, 2012b) also suggested a study of clustering problems using bat algorithm and its expansion like a bi-sonar optimization variant with positive results. On the other side, Mishra et al. (2012) applied bat algorithm to categorize microarray data, while Natarajan et al. (2012) presented a comparison study of cuckoo search and also bat algorithm for Bloom filter optimization. Damodaram and Valarmathi (2012) studied phishing website detection applying modified bat algorithm and also attained very good outcome.
Normally, the automatic segmentation problem is very challenging and it is yet to be fully and satisfactorily solved. The aim of this tumor detection approach is to identify and segment the MRI tumor automatically. It takes into account the statistical features of the brain structure to represent it by significant feature points. Most of the early methods presented for tumor detection and segmentation may be broadly divided into three categories: region-based, edge-based and fusion of region and edge-based methods. Well known and widely used segmentation techniques are K-Means clustering algorithm, supervised method based on neural network classifier [4]. Also, the time spent to segment the tumor is getting reduced due to the detailed representation of the medical image by extraction of feature points. Region-based techniques look for the regions satisfying a given homogeneity criteria and edge based segmentation techniques look for edges between regions with different characteristics
As probably the most natural type of storing information is text and text mining is believed to have a commercial potential greater than that of data mining. The recent study indicated that 80% of a company’s information is including in text documents. Text mining, however, is also a more complex task as compare to data mining as it involves dealing with text data that are naturally unstructured and fuzzy. Text mining is a multidisciplinary area, involving information retrieval, text examination, information extraction, clustering, categorization, visualization, database technology, device learning, and data mining [2]. In Text Mining, patterns are extracted from natural language Textual Database. There are many methods of text mining. In general, the major approaches, based on the kinds of data they take as input,
Trajectory clustering with probabilistic modeling of a set of trajectories is proposed in paper\cite{gaffney2007probabilistic}. Whole trajectories may miss interesting common paths in sub-trajectories. Lee et al. \cite{lee2007trajectory} proposed a partition-and-group framework that finds out an interesting common path in sub-trajectories. Mao et al. \cite{mao2016mining} proposed mining spatiotemporal patterns of urban dwellers from taxi trajectory data. The method involves three critical steps that are discover potentially meaningful locations using spatial clustering of taxi origin-destination, identifying perfect parameter values to extract jobs housing urban infrastructure and visualization of this spatial distribution and temporal trends of the revealed urban infrastructure. Another swarms mining method on big spatiotemporal trajectories using MapReduce \cite{yu2016mr} has been proposed to mine swarm pattern. They propose a parallel model based on timeset independent property where data sets are divided based on time and creates local swarm patterns. Even though, combining these local swarms, global swarms have been created. Our observation is that partitioning datasets based on time lost data consistence and significant number of patterns removed from local swarms and in addition in huge number of datasets, map and reduce process takes extra time for mapping and reducing. Chen et al. \cite{chen2017mining}
J.M. Pena et al., proposed to perform the optimization of the BN parameters using an alternative approach to the EM technique. We provide experimental results to show that our proposal results in a more effective and efficient version of the Bayesian Structural EM algorithm for learning BNs for clustering [2].
In the case of Image Recognition the concept of clustering can be applied to identify the clusters in handwritten character recognition systems. Many applications of clustering are also found in Web search. Clustering can be utilized to organize the query results in groups and present the outcomes in a concise and effectively available way. We can distinguish and sparse regions in object space by automated clustering and from that we can find general interesting correlations and overall distribution patterns among data attributes. Cluster analysis has been broadly utilized as a part of various applications, like market research, pattern recognition, data analysis, and image processing. In business, clustering can offer marketers some assistance with discovering distinct groups in their client bases and portray client groups taking into account the purchasing patterns. In science, it can be utilized to determine plant and animal scientific categorizations, order qualities with similar functionality, and addition knowledge into structures inborn in populations. For the identification of the same land use in the earth observation database we use clustering. It can also be used for finding groups of houses in a city based on the house type, value and the geographic location. It will be helpful in identifying the groups of the policy holders with the highest average claim cost of the automobile insurance.
GP is an evolutionary computing method that uses the principle of Darwinian natural selection in order to create computer programs to solve a problem. Koza introduced GP (1992) as a branch of genetic algorithm. GP covers a high level of diversity by breeding random different computer program as the population. The programs are structured like trees containing functions and terminals. A function set and a terminal set should be defined as the sources of functions and terminals. If they are rich enough, tree structures are able to reach any complexity. The function set, for example, can be built of arithmetic operation (+, −, ×, or ÷), Boolean logic functions (AND, OR, NOT, etc), mathematical functions (sin, cos, log), conditional functions (IF, THEN, ELSE), or any other functions. The terminal set can be included of variables, numerical constants, functions with no argument, etc. Random functions and terminals form a treelike structure program. Branches that contain functions and terminals are connected to the root point. A typical example of GP tree is shown in (FIG)
Abstract-This paper gives a brief description of the above titled paper. Data clustering is one of the most widely used method for various applications. And parallelizing these time-consuming applications is of quite importance. This paper brings out an additional feature of handling input data of various dimensions and thus accordingly handle it.
Multimedia data mining is a popular research domain which helps to extract interesting knowledge from multimedia data sets such as audio, video, images, graphics, speech, text and combination of several types of data sets. Normally, multimedia data are categorized into unstructured and semi-structured data. These data are stored in multimedia databases and multimedia mining is used to find useful information from large multimedia database system by using various multimedia techniques and powerful tools. This paper provides the basic concepts of multimedia mining and its essential characteristics. Multimedia mining architectures for structured and unstructured data, research issues in multimedia mining, data mining models used for
Segmentation and list management-cover areas such as data mining, segmentation of customer and other information.
In clustering and classification phase dissimilar data is differentiated from similar data. In our project this step can be omitted because the dataset is already in the form of a CSV file, divided according to the respective link types.
In 2016, S.S. Pal, et al. [38] in the paper titled “Multi-level Thresholding Segmentation Approach Based on Spider Monkey Optimization Algorithm” introduced SMO for histogram based bi-level and multi-level segmentation of grey scale images. SMO has likewise been utilized to maximize Kapur’s and Otsu’s objective function. Results delineated that the new segmentation method is able to improve results in terms of optimum threshold values and CPU time when compared to other nature inspired algorithms.
Abstract- Interactive image segmentation has become more and more popular among researched in recent years. Interactive segmentation, as opposed to fully automatic one, supplies the user with means to incorporate his knowledge into the segmentation process. However, in most of the existing techniques, the suggested user interaction is not good enough since the user cannot intuitively force his knowledge into the tool or edit results easily. Therefore, in ambiguous situation, the user has to revert to tedious manual drawing. Presented method to develop as a combined segmentation and editing tool. It incorporates a simple user interface and a fast and reliable segmentation based on 1D segment matching. The user is required to click just a few "control points" on the desired object border, and let the algorithm complete the rest. The user can then edit the result by adding, removing and moving control points, where each interaction is follows by an automatic, real-time segmentation by the algorithm.
The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centers, one for each cluster. These centers should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to a given data set and associate it to the nearest center. When no point is pending, the first step is completed and an early group age is done. At this point we need to re-calculate k new centroids as barycenter of the clusters resulting from the previous step. After we have these k new centroids, a new binding has to be done between the same data set points and the nearest new center. A loop has been generated. As a result of this loop we may notice that the k centers change their location step by step until no more changes are done or in other words centers do not move any more. Finally, this algorithm aims at minimizing
Data Mining is a collection of number of computational approaches. These approaches are used to develop