Crawling is the process describing an automated computer program (called a crawler, spider, or robot) browsing the Internet, typically for the purpose of indexing the content.

How do crawlers work

Crawlers work by starting with a URL or group of URLs which they will download and view the content of, which they may or may not decide to add to their index. They will then visit each URL that is linked to on these pages and do the same with them. They will continue recursively visiting all of the URLs linked to on these pages as well, exponentially increasing the number of pages they need to visit.

The word crawl describes the metaphor of spiders (computer programs) crawling (browsing) the World Wide Web.

Crawling and indexing

It should be noted that while crawling and indexing are similar and related terms, they have an important distinction.

  • Crawling a page simply means that a robot merely sees the page's content, but does not imply they will do anything with it.

  • Indexing a page means that a robot will save the page's content into its index).

While the purpose of most robots is to index the content that they crawl, there could be many reasons why they decide not to index a particular page: The page could include duplicate content, the robot may decide that the page is of low importance on the site, the page could be webspam, or other factors.

Did this answer your question?