How Does a Search Engine Work And What Search Engine Best For You?
I had researched on this topic for about 3-4 months during my college days. These are some basic steps in how searching works for basic search engines.
There are three basic stages for a search engine: crawling – where content is discovered; indexing, where it is analyzed and stored in huge databases; and retrieval/searching, where a user query fetches a list of relevant pages.
Crawling
This involves scanning the site and getting a complete list of everything on there – the page title, images, keywords it contains, and any other pages it links to – at a bare minimum. Modern crawlers may cache a copy of the whole page, as well as look for some additional information such as the page layout, where the advertising units are, where the links are on the page.
Web pages are crawled by a web spider which is an an automated bot. It visits each page, just like you or I would, only very quickly and it is added indexed page list.
When a page contains hyperlink (link to another page) it is automatically added to discovered page list by spider.
Indexing
Indexing is the process of taking all of that data you have from a crawl, and placing it in a big database. Imagine trying to a make a list of all the books you own, their author and the number of pages. Going through each book is the crawl and writing the list is the index. But now imagine it’s not just a room full of books, but every library in the world. That’s pretty much a small-scale version of what Google does.All of this data is stored in vast data-centres with thousands of petabytes worth of drives.
Retrieval/Searching
The last step is what you see – you type in a search query, and the search engine attempts to display the most relevant documents it finds that match your query. This is the most complicated step, but also the most relevant to you or I, as web developers and users. It is also the area in which search engines differentiate themselves. Some work with keywords, and some include advanced features like keyword proximity or filtering by age of content.
The ranking algorithm checks your search query against billions of pages to determine how relevant each one is. This operation is so complex that companies closely guard their own ranking algorithms as patented industry secrets. Why? Competitive advantage for a start – so long as they are giving you the best search results, they can stay on top of the market. Secondly, to prevent gaming of the system and giving an unfair advantage to one site over another.
There's a webpage by Google explaining how its search engine works but don't get sneaky they won't tell you how ranking works.
Comments