preview

News Aggregation Of Python Using Hierarchical Clustering

Better Essays

News Aggregation in Python using Hierarchical Clustering Rahul S Verma CSE Department IMSEC Ghaziabad rahul.1a94@gmail.com Satyam Gupta CSE Department IMSEC Ghaziabad satyam905@gmail.com Shivangi CSE Department IMSEC Ghaziabad bitts.beans@gmail.com ABSTRACT In this paper we are going to illustrate a way to cluster similar news articles based on their term frequency. We will using python and nltk to recognize keywords and subsequently using hierarchical clustering algorithm. This method can be used to build news aggregation backends. Aggregation means clustering like documents from different sources. There is fast moving data and heterogeneity of sources in news aggregation scenarios. We need to remove the duplicates arising due to heterogeneous sources. General Terms Hierarchical clustering, algorithms, aggregation, news, text mining et al. Keywords Python, nltk, feedparser, news aggregation. 1. INTRODUCTION News Aggregators can be considered a multilateral platform of interconnection [1]. In principle, news aggregators can be a substitute or a complement to the news outlets who invest in the creation of news stories. A policy debate centers around the decrease in the incentives for news creation that results if readers choose to consume their news through aggregators without clicking through to the news websites or generating any revenue for the outlets [2]. Getting these two ideas in perspective our ideas is to get a small script in python which anyone can run on their own

Get Access