Advantages And Disadvantages Of Text Clustering

993 Words4 Pages
S.Murali Krishna et al., 2010 An Efficient Approach for Text Clustering Based on Frequent Itemsets In the proposed research, we have devised an efficient approach for text clustering based on the frequent itemsets. A renowned method, called Apriori algorithm is used for mining the frequent itemsets. The mined frequent itemsets are then used for obtaining the partition, where the documents are initially clustered without overlapping.
The access to a large quantity of textual documents turns out to be effectual because of the growth of the digital libraries, web, technical documentation, medical data and more. It does not require a pre-specified number of clusters

G. sailaja et al., 2014 A Novel
…show more content…
This does not satisfy the second condition of a metric, because after all the combination of two copies is a different object from the original document.

Florian, et al. 2002 Frequent Term-Based Text Clustering In this paper, we introduce a novel approach which uses frequent item (term) sets for text clustering. Such frequent sets can be efficiently discovered using algorithms for association rule mining. To cluster based on frequent term sets, we measure the mutual overlap of frequent sets with respect to the sets of supporting documents.
It can be used to structure large sets of text or hypertext documents. Very high dimensionality of the data, this requires the ability to deal with sparse data spaces or a method of dimensionality reduction.
Anton Bakalov, et al. 2012 Topic Models for Taxonomies This paper introduces two semi-supervised topic models that automatically augment a given taxonomy with many additional keywords by leveraging a corpus of multi-labeled documents. the models provide a better information rate compared to Labeled LDA The concept node names often do not de-scribe the concept in sufficient detail for unfamiliar users to fully understand the topics a node is intended to
Get Access