Since search engines' resources are not unlimited, they are not able to instantly crawl every page on the Internet. They put limits on the number of pages they're willing to crawl, and the frequency of how often they crawl the site. These limits are ambiguous, unofficial, and are not made known to webmasters.
There are many factors that could impact the crawl budget of a site, including:
- Internal linking strategy
- Site structure and depth
- Number of external links to the domain
- Number of external links to each page
- Deep links into the site
- Authority or trust of domain
- Size of the site
- Presence or absence of redirection problems, 4xx or 5xx errors
- Use of meta robots tag, x-robots tag, or robots.txt file
- Presence or absence of duplicate content
- Frequency of site updates
Because of these limits, and because search engines will likely not crawl every page on a site, webmasters should work to prioritize the URLs on the site, and structure the site to help incentivize search engines to crawl their most important pages first. This is thought of as "spending the crawl budget wisely". If search engines crawl many duplicate pages, error pages, or pages with unimportant content, they may not have enough "budget" left to crawl the important content pages on the site, so the crawl budget will not have been spent wisely.