Hierarchical Document Clustering Based On Cosine Similarity Measure

783 Words4 Pages
Hierarchical Document Clustering based on Cosine Similarity measure Ms. Shraddha K.Popat* Ms. Vishakha A. Metre Asst.Professor, Asst.Professor, Department of computer Engineering, Department of computer Engineering, D.Y.Patil, College of Engineering, Akurdi, Pune, India D.Y.Patil, College of Engineering, Akurdi, Pune, India shraddhakp21@gmail.com vishakha.metre@gmail.com Abstract- Clustering is one of the prime topics in data mining. Clustering partitions the data and classifies the data into meaningful subgroups. Document clustering is grouping of document set into clusters such that document within each cluster are more alike between each other than those in different cluster. In this paper, an experimental exploration of similarity based method, HSC for measuring similarity between data objects particularly text documents is introduced. It also provides an algorithm which approaches incrementally and evaluates cluster cohesiveness by carefully watching pair-wise similarity between documents that leads to much improved results over other traditional methods. It also focuses on selection of appropriate similarity measure which plays significant role in measuring similarity between the documents. Keywords-Clustering, Document clustering, Hierarchical, Similarity measures. I. INTRODUCTION There have been rich source of datasets available in recent years. Data mining is the practice of automatically searching enormous amount of data to discover patterns and trend beyond
Open Document