Hierarchical Document Clustering Based On Cosine Similarity Measure

783 Words Nov 15th, 2016 4 Pages
Hierarchical Document Clustering based on Cosine Similarity measure
Ms. Shraddha K.Popat* Ms. Vishakha A. Metre
Asst.Professor, Asst.Professor,
Department of computer Engineering, Department of computer Engineering,
D.Y.Patil, College of Engineering, Akurdi, Pune, India D.Y.Patil, College of Engineering, Akurdi, Pune, India shraddhakp21@gmail.com vishakha.metre@gmail.com

Abstract- Clustering is one of the prime topics in data mining. Clustering partitions the data and classifies the data into meaningful subgroups. Document clustering is grouping of document set into clusters such that document within each cluster are more alike between each other than those in different cluster. In this paper, an experimental exploration of similarity based method, HSC for measuring similarity between data objects particularly text documents is introduced. It also provides an algorithm which approaches incrementally and evaluates cluster cohesiveness by carefully watching pair-wise similarity between documents that leads to much improved results over other traditional methods. It also focuses on selection of appropriate similarity measure which plays significant role in measuring similarity between the documents.

Keywords-Clustering, Document clustering, Hierarchical, Similarity measures.
There have been rich source of datasets available in recent years. Data mining is the practice of automatically searching enormous amount of data to discover patterns and trend beyond…
Open Document