The General Search Engine Architecture

Different search engines such as Google are complex, sophisticated, distributed systems. Below we reproduce the general search engine architecture discussed in “Searching the Web”, Arvind Arasu, Junghoo Cho, Hector Garcia-Molina, Andreas Paepcke, Sriram Raghavan (Stanford University). ACM Transactions on Internet Technology (TOIT), Volume 1, Issue 1 (August 2001).
The main components include, parallel crawlers/ and crawler control (when and where to crawl), page repository, indexer, analysis, collection of data structures (index tables, structure, utility), and query engine and ranking module. Such a general architecture would take a significant amount of time to code. In this course, we implement stripped down versions of the main components - we call this TinySearch - shown in Figure 1. Figure 1: General search engine architecture [Arvind, 2001].

Major search engines based on United States and are specialized for English. Documents search by user using keywords For example, these typical search engines are AltaVista, Excite, and Northern Light. However, there are also other type of search engines that are specialized in other languages such as Chinese, Korean, and Japanese (written Kanji). Examples are Chinese Yahoo! (, Yahoo! Japan ( Yahoo! India ( since these Kanji languages are not written in the Latin alphabet (different data structure), they might need to have…
