Content Based Approach : A Slightly Different Implementation

2.2. Content Based Approach A slightly different implementation is employed in content based approach, where the main idea, is to compare the contents of the site, rather than URL. Content based approach is also known as visual similarity based approach in which the contents like text, images and styles are compared with the contents of original site and the similarity is evaluated. This process is lengthier and consumes time because the entire content is compared, and then the decision is made. Visual similarity [2] in the phishing web page is another issue which is hard to detect. So there must be a unique way of phishing detection mainly considering the google page rank. The core idea behind the process is to maintain a blacklist of URL’s and whenever user enter a web page it checks the URL with the URL’s present in the database. If the URL wasn’t present it will determine the google page rank of web and calculate the heuristic value based on the rank. Finally it compares the value with the threshold and then alerts the user if the value exceeds the threshold. The major drawback of the system is time taken by the process to detect because of rank calculation. The system works effectively because google page is updated very frequently and thus helps in effective detection process. Another visual similarity implementation is the one based on EMD (Earth Mover’s Distance) [3]. EMD is a method that is used to measure the closeness between two signatures. To check the visual
