RESEARCH ISSUES IN WEB MINING
Dr.S. Vijayarani1 and Ms. E. Suganya2
1Assistant Professor, School of Computer Science and Engineering, Bharathiar University, Coimbatore, Tamilnadu, India
2 M.Phil Research Scholar, Computer science and Engineering, Bharathiar University, Coimbatore, Tamilnadu, India
1vijimohan_2000@yahoo.com
2elasugan1992@gmail.com
Abstract- Web is a collection of inter-related files on one or more web servers while web mining means extracting valuable information from web databases. Web mining is one of the data mining domains where data mining techniques are used for extracting information from the web servers. The web data includes web pages, web links, objects on the web and web logs. Web mining is used to understand the customer behaviour, evaluate a particular website based on the information which is stored in web log files. Web mining is evaluated by using data mining techniques, namely classification, clustering, and association rules. It has some beneficial areas or applications such as Electronic commerce, E-learning, E-government, E-policies, E-democracy, Electronic business, security and crime investigation and digital library. Retrieving the required web page from the web efficiently and effectively becomes a challenging task because web is made up of unstructured data, which delivers the
This section discuss about the common traits or ideas observed in the three research topics. Although, each of the three articles discuss a unique idea, all of them are aimed at utilizing the web data to produce better results. Web data mining is a hot research topic in the current realm of big data. These papers discuss about the utilization of the valuable user generated data from the social media or the browser cookies to provide the best user experience in order to maintain the user interest in the company's product or to take effective decisions by an individual. All the three articles propose an idea to solution the problem stated, compared their results to the existing models and showed significant improvement.
Here we discuss about the common traits or ideas observed in the three research topics. Although, these three papers discuss about different ideas, they all fall under the web data mining domain. web data mining is a hot research topic in the current realm of big data. These papers discuss about the utilisation of the valuable user generated data from the social media or the the browser cookies to provide the best user experience in order to maintain the user interest in the company's product or to take effective decisions by the individual.
There are over 1 billion active websites today and its growth is exponential. Not all of them are optimized and effectively used by the users. Every owner of the website should continuously assess and improve the effectiveness of their website, if they want to address the above question. How they can do that? That is where Web Analytics comes into picture. The process of tracking, collecting, measuring, reporting and analyzing the data collected from the web for understanding and optimizing the web usage is known as Web Analytics. We can track every click of every person on the website. It provides information about number of visitors to a website and number of page views. In other words, Web Analytics is tracking the visitor behavior of
Nowadays, data mining and machine learning become rapidly growing topics in both industry and academic areas. Companies, government laborites and top universities are all contributing in knowledge discovery of pattern recognition, text categorization, data clustering, classification prediction and more. In general, data mining is the technique used to analyze data from multi perspectives and reveal the hidden gem behind the enormous amount of data. With the explosive growth of data collections, it becomes time-consuming less effective to extract valuable information from massive databases through the use of traditional data analysis methods. An alternative way to solve this problem is to apply data mining, given considerations
However, targeted advertising has raised new questions on privacy since it must collect user’s information in order to publish advertisement. When a consumer visits a website, every page they view, the time spent on each page, the new pages they click on and how they interact with the server, allow browsers to collect that data. Analyzing from the technology used in behavioral targeting advertising, web browsing history will be tracked and sent to web server. In order to best select advertisements to display, data mining and machine learning theory will be implemented for analyzing users’ behavior (Korolova 2010).
Due to the huge growth and expansion of the World Wide Web, a large amount of information is available online. Through Search engines we can easily access this information with the help of Search engine indexing. To facilitate fast and accurate information retrieval search engine indexing collects, parses, and store data. This paper explains partitioning clustering technique for implementing indexing phase of search engine. Clustering techniques are widely used for grouping a set of objects in such a way that objects in the same group are more to each other than to those in other groups in “Web Usage Mining”. Clustering methods are largely divided into two groups: hierarchical and partitioning methods. This paper proposes the k-mean partitioning method of clustering and also provide a comparison of k-mean clustering and Single link HAC . Performance of these clustering techniques are compared according to the execution time based on no of clusters and no of data items being entered.
In this study, the K-Means algorithm is used to group the web data into different clusters based on the location of the web users which is obtained from the IP addresses. The work assumes to separate the users based on the location from where the request is being generated. After obtaining the clusters, the algorithm to generate the association rules is applied.
Web analytics is the practice of measuring, collecting, analyzing and reporting on Internet data for the purposes of understanding how a web site is used by its audience and how to optimize its usage. Web analytics helps a business owner break down the measure of information that originates from the web and aides in extraction of information in a simplified manner. In addition, Web analytics helps the organization in externalizing and standardizing information including variables that can be compared with each other.
Web analytics is nothing but collecting of web data, measuring the date, analysing the data and create the report of web usage. Web analytics used to measure the web traffic and also used as tool for business and market research for improve the effectiveness of web site. Web analytics can also be used to measure
Although association rule methods have advantages, there are also some limitations that might cause loosing information. Exemplary association rules concentrate on the co-occurrence of items like purchased products, visited web pages, etc. within the transaction set. A single transaction can be a payment for purchased products or services, an order with a set of items with a historical session in a web portal. Alternate independence of items, products and web pages, is one of the most significant hypotheses of the technique, but it is not fulfilled in the web domain. Web pages are linked with each other by using hyperlinks, and they often calibrate all potential navigational paths. A user can enter the required web page address URL to a browser. However, most navigation is completed with the help of hyperlinks created by site administrators. Hence, the web structure sorely incarcerates visited list of pages, user sessions, which are not independent of one another as products in a ideal store. To access a page, the user is usually imposed to
The Internet is a vast resource of data of different sorts: text, pictures, sound and video. With the continuous growth and abundance of data available on the Internet, the World Wide Web has become a huge repository of information. Web mining is the application of data mining to large web data repositories. Web Mining is the use of information mining techniques to automatically discover and extract information from web documents and
The web continues to increase with each passing day with a large number of people going online every day to perform a number of actions, such as purchasing goods and services online, finding the required information, etc. With the availability of free Wi-Fi and low-priced Internet connections, the consumption of websites, blogs, and web applications have increased a lot.
Web analytics is the study of how users interact with the websites by recording user related data, which helps in identifying different aspects of user’s behaviour. It is the measurement, collection, analysis and reporting of web data for the purpose of understanding and optimizing web usage. Web analytics can be used as a tool for business and market research and to assess and improve the effectiveness of a website.
“Torture data long enough and it will confess. . . but may not tell the truth” (Turbin, Volonino, & Woods, 2015, p. 88). In the world of Big Data Analytics (BDA), companies who successfully harness the potential of big data are rewarded with valuable insight that could lead to a competitive advantage in their market sector. Consequently, it is imperative to successfully extract data from all relevant data sources that can provide answers to questions that companies set out to answer with BDA. One such source is data generated by social media (Schatten, Ševa, & Đurić, 2015). As such, this paper will review the findings of Schatten, Ševa, & Đurić’s(2015) article on how social web mining and big data can be utilized within the social
Nowadays, organizations and their websites are “like lips and teeth”. Organizations are hard to exist without a customized and optimized website. Website is a bridge linking between customers, viewers and organizations; it opens the door for outsiders to interact with insiders and vise versa. Over the past few years, data has been increasingly bigger and bigger on the world wide web. The number of websites is mushrooming all the time, surpasses 1 billion and continues to climb. Organizations which does not want to left behind in this ever-changing world of data must know how to manipulate it, how to analyze it to get significant insights from it. However, there is a paradox of data: “a lack of it means you cannot make complete decisions, but even with a lot of data, you still get an infinitesimally small number of insights”. This paradox is also true for the Web; there is a lot of data, but there are also critical barriers to making intelligent decisions. In order to break such barriers, people developed web analytics. Web analytics is a way to help us find out what is happening with our websites, blogs, or just social network pages by using the data collected from those online resources to achieve our goals. There are many web analytics tools on the market that can help us achieve this. In the dawn of web analytics, most businesses that focus on web analytics think of analytics simply as the art of collecting and analyzing clickstream data. It was a good start; however,