Web based document (WBD) commonly known as Latent Semantic Indexing in the context of information retrieval is a fully automatic mathematical/statistical technique for extracting and inferring relations of expected contextual usage of words in passages of discourse. It is based on the application of a particular mathematical technique, called Singular Value Decomposition (SVD), to a word-by-document matrix [4]. The word-by-document matrix is formed from WBD inputs that consist of raw text parsed into words defined as unique character strings and separated into meaningful passages or samples such as sentences or paragraphs. This application provides a way of viewing the global relationship between terms in the whole documents’ collection enabling the semantic structures within the collection to be unearthed. WBD application in information retrieval is motivated by the challenges encountered in natural language processing where a word may have several meanings (polysemy) and several words may mean the same thing (synonymy) thereby presenting ambiguities in expressing users’ concepts. For example, several empirical studies show that the likelihood of two people choosing the same keyword for a familiar object is less than 15%. It is due to these challenges that mere keywords searching techniques are inadequate in addressing user queries. WBD enables retrieval on the basis of conceptual content, instead of merely matching words between queries and
