Today with the ever growing use of computers in the world, information is constantly moving from one place to another. What is this information, who is it about, and who is using it will be discussed in the following paper. The collecting, interpreting, and determination of use of this information has come to be known as data mining. This term known as data mining has been around only for a short time but the actual collection of data has been happening for centuries. The following paragraph will give a brief description of this history of data collection.
In today’s world, for almost every business, it is essential to have a website which serves as an information gateway for customers to gain insights into the business. In return, a website is also a tool through which business analysts can understand more about the behavior of customers. This would mean that tracking and analyzing website data is crucial to making sound business decisions, hence the growing favor for Web Analytics.
The method employs data mining techniques such as a frequent pattern and reference mining found from (Holland et al., 2003; KieBling & Kostler, 2002) and (Ivancy & Vajk, 2006). Frequent and reference mining is a heavily research area in data mining with wide range applications for discovering a pattern from Web log data to obtain information about navigational behavior of
The functioning of Web crawler [10] is beginning with a set of URLs which is called as seed URLs. They download web pages with the help of seed URLs and take out new links which is present in the downloaded pages. The retrieved web pages are stored and well indexed on the storage area so that by the help of these indexes they can later be retrieved as and when required. The URLs which is extracted from the downloaded web page are confirmed to know whether their associated documents have already been downloaded or not. If associated document are not downloaded, the URLs are again allocated to web crawlers for further downloading. The same process is repeated till no more URLs are missing for downloading. Millions of web pages are downloaded daily by a crawler to complete the target. Fig. 1 illustrates the proposed crawling processes.
Web analytics is the study of how users interact with the websites by recording user related data, which helps in identifying different aspects of user’s behaviour. It is the measurement, collection, analysis and reporting of web data for the purpose of understanding and optimizing web usage. Web analytics can be used as a tool for business and market research and to assess and improve the effectiveness of a website.
The Internet is a vast resource of data of different sorts: text, pictures, sound and video. With the continuous growth and abundance of data available on the Internet, the World Wide Web has become a huge repository of information. Web mining is the application of data mining to large web data repositories. Web Mining is the use of information mining techniques to automatically discover and extract information from web documents and
The World Wide Web (WWW) has become a great medium for exchange of information between users. The World Wide Web is an information space where documents and other web resources are identified by URLs, interlinked by hypertext links and accessed by via the Internet. It was invented by English scientist Tim Berners-Lee in 1989 and is now simply know as “Web”. In this new millennium where Internet is available in every household it has become necessary that having a website is mandatory for a business in order to reach the customers. Not just the knowledge of Hyper text Markup langauge but building a website right from hosting , selecting a best domain name and having a enough knowledge on Copy rights etc with
World wide web (WWW) created by Tim Burners-Lee is the building blocks of what we know as the internet today. There has been a continuous growth even at present, versions of the web are adapting to contribute to user satisfaction. It is a surface for virtual communication where data is transfers wth hypertext document. Overlooking the formation of WWW has 3 stages: "Web of document(Web 1.0), Web of people (Web2.0) and Web of data (Web3.0)" (Choudhury, 2014)
Data mining techniques can be mainly divided into three categories: Web structural mining, Web Content mining and web usage mining. Web structural mining is used to discover structure from data available on web like hyperlinks and documents. It can be helpful to the user for navigating within documents as mining can be done to retrieve intra and inter hyperlinks and DOM structure out of documents. Web Content mining can be used to extract information from the data available on web like texts, videos, images, audio files etc. Web usage mining is the application of data mining techniques to discover interesting usage patterns from web usage data, in order to understand and better serve the needs of web-based applications (Srivastava, Cooley, Deshpande, and Tan 2000). Websites or usage data takes user’s available information and browsing history, location of user etc. as input for mining information. Web usage mining can be further divided into three categories depending upon the type of data used for mining: web server logs, application server, application level logs. Web usage mining can be highly helpful in mining the data for web applications and thus helping development in fields like E-Commerce. It can be helpful to discover usage patterns from Web data, thus helping serve better the needs of Web-based applications. Web usage mining can be categorized into three different phases: Preprocessing, Pattern Discovery and Pattern Analysis. I believe, this
Abstract— Data mining is a logical process that is used to search through large amount of data in order to find useful data [2].There are many different types of analysis that can be done in order to retrieve information from big data. Each type of analysis will have a different impact or result. Which type of data mining technique you should use really depends on the type of business problem that you are trying to solve.
The web continues to increase with each passing day with a large number of people going online every day to perform a number of actions, such as purchasing goods and services online, finding the required information, etc. With the availability of free Wi-Fi and low-priced Internet connections, the consumption of websites, blogs, and web applications have increased a lot.
Although association rule methods have advantages, there are also some limitations that might cause loosing information. Exemplary association rules concentrate on the co-occurrence of items like purchased products, visited web pages, etc. within the transaction set. A single transaction can be a payment for purchased products or services, an order with a set of items with a historical session in a web portal. Alternate independence of items, products and web pages, is one of the most significant hypotheses of the technique, but it is not fulfilled in the web domain. Web pages are linked with each other by using hyperlinks, and they often calibrate all potential navigational paths. A user can enter the required web page address URL to a browser. However, most navigation is completed with the help of hyperlinks created by site administrators. Hence, the web structure sorely incarcerates visited list of pages, user sessions, which are not independent of one another as products in a ideal store. To access a page, the user is usually imposed to
This research paper highlight the importance and need of data mining in the age of electronic media where large amount of information and consolidated database is readily available. This seemingly useless information can unearth some mind-blowing statistics and predict the future trends with relative ease through use of data mining techniques which can benefit the businesses, start-ups, country and individual alike. However, since data mining is effective in bringing out patterns, correlation and association through complex algorithms and analysis, it has, over the past few decades proved to be a useful tool in cyber or internet security.
This section discusses text mining and Web mining that are taking on significance as more data and information is stored in text documents and on the Web. Web mining is divided into three categories: content mining, structure mining, and usage mining. Each one provides specific information on patterns in Web data.
Due to the huge growth and expansion of the World Wide Web, a large amount of information is available online. Through Search engines we can easily access this information with the help of Search engine indexing. To facilitate fast and accurate information retrieval search engine indexing collects, parses, and store data. This paper explains partitioning clustering technique for implementing indexing phase of search engine. Clustering techniques are widely used for grouping a set of objects in such a way that objects in the same group are more to each other than to those in other groups in “Web Usage Mining”. Clustering methods are largely divided into two groups: hierarchical and partitioning methods. This paper proposes the k-mean partitioning method of clustering and also provide a comparison of k-mean clustering and Single link HAC . Performance of these clustering techniques are compared according to the execution time based on no of clusters and no of data items being entered.