Google which do not need any introduction holds the first place in search with a gap of 45% from the second place which is Bing. It is a larger gap. According to statistics, by 2012, 69.5% of searches were performed by Google while only 25% were by Bing. It is the best mobile or tablet search engine too holding a market share of 89%. Isn’t it amazing?
Google is a search engine that is heavily using the structures which are presented in hypertext. Google is a search engine which is designed to index the Web and for crawling in an efficient way and also it produces search results than the systems which exist.
Designing search engines which is suitable for even today is creating many challenges. The technology of fast crawling is required to collect web documents and to keep them up to date. The spaces of storage should be used to store indices and documents themselves optionally in an efficient way. The indexing system of Google should process hundreds of gigabytes containing data efficiently. The queries must be handled very faster. It should has a rate about hundreds to thousands queries for one second.
There are some facts to consider when focusing on the design goals of Google. The improved search quality is the first goal. Improvement of the quality of web search engines is their main goal which they often try to fulfill.
It is also becoming increasingly commercial overtime along with the tremendous growth of web. In the year 1993, the number of web servers having .com domains was 1.5%. In 1997, this has increased by a larger percentage which is 60%. And also the search engines migrated from the domain of academic to commercial.
Another design goal of Google is building systems which can be used by a reasonable number of people. The last design goal is building an architecture which is able ti help novel research activities on large scale data.
When talking about the system features of Google, there are two main characteristics which support to generate high precision results. It uses the link structure of the web in order to calculate a quality ranking for every web page firstly. This ranking method is called PageRank. Secondly it utilizes the link in improving the search results.
To model the behavior of a user the PageRank can be used. In this we make an assumption that there is a random surfer who is given a web page randomly and he keeps clicking the links and will not hit back and gets bored eventually and will start on another new random page. This probability the random surfer visiting a web page is called its PageRank.
Anchor Text is another system feature of Google. In a special manner the texts of the links are treated in search engines. Many search engines are associating the link text with a page the link actually exists.
When focusing on the information retrieval, the systems that were used for this process were many years old and were developed well. But, the research on these information retrieval systems were on small and well controlled homogeneous collections like collection of scientific papers or the news stories related to a topic. So this is a simple anatomy of the large scale hyper textual web search engine, Google.