The Importance Of Software Testing Research Paper

Decent Essays
Web pages have a great importance, especially today with the rise of technology. This paper describes the PageRank method, used for rating web pages, basically measuring the human relevance of the web page for the content it contains. We use PageRank to rank the 10 most important citations found in 20 Software Testing research papers. We show how to get the citation text you need and use PageRank to rank the citation titles by occurrence.
The Internet creates lots of difficulty and challenges for retrieving information from web pages. It is huge, containing over 200 million web pages with many more appearing every day. Compounding the matter is the fact that web pages are very distinct in content, ranging from puppies to scientific papers
…show more content…
From there, I will sort the Hash Map count values in decreasing order, and print only the 10 most common values, with their corresponding title.
To create the program, written in Java, I followed a list of steps to read the citations for all 20 of the research papers. First, I downloaded all 20 of the Software Testing research papers as PDFs, and saved them in my program’s local directory. Then, using Apache PDFBox library, I read each of the papers as a string. We’re only focused on the paper citations, so I took a substring of the paper string, which only included citation text found after the “References” keyword.
To get each citation title, I had to create a unique String Regex, based on the paper’s citation format, that only contained text found between 2 characters. This was hard to accomplish for all 20 research papers, due to each paper having varying citation formats. For some papers, the citation titles were enclosed in double quotes, and the rest had different formats which made capturing the citation title text very difficult. For the research paper’s where I was able to extract the citation title text, I counted all the occurrences of each title string and saved the titles and occurrence counts in a Hash Map. From there, I sorted the Hash Map by occurrence values in decreasing order, printing only the 10 most frequent. Due to not
Get Access