The crawl budget is something you should optimize for your SEO if you are operating on a large site with a large number of pages. In this article, we Barbados Email List focus on the basics of the crawl budget, why it is important and how to optimize it to boost your SEO strategy. The crawl budget is a concept that lived on in the closed circles of SEO consultants for a decade but which, fortunately, has become more and more democratized in recent years. Even so, it remains an aspect that is still too often underestimated in SEO strategies . While most of you have heard of this term and considered considering it, it can sometimes be difficult to identify the benefits to your site’s visibility.

So yes, it is true that sometimes some SEO consultants will tell you to ignore the crawl budget! But if your site is made up of several thousand pages (or even many more), optimizing your crawl budget will represent a real turning point for your organic visibility. Summary What is the crawl budget? Why do search engines allocate a crawl budget to websites? How is the crawl budget allocated to websites? Why is the crawl budget essential for your SEO? How to optimize your crawl budget? Simplify your site architecture Watch for duplicate content Manage your URL parameters Limit your low-quality content Broken and incorrectly redirected links Incorrect URLs in XML Sitemaps Pages that load too slowly A high number of non-indexable pages.

Don’t forget the PageRank!

A bad internal mesh Don’t forget the PageRank! What is the crawl budget? Crawl budget can be described as the level of attention search engines pay to your site. This level of attention is based on the resources allocated by engine robots to crawl the pages of your website and the frequency of these crawls. Basically, the size of your site is analyzed to dedicate a level of resources. If you waste your crawl budget, search engines won’t be able to crawl your website effectively, which will ultimately hurt your SEO performance . Your goal is therefore to ensure that Google spends its crawl budget by crawling the pages that you want to see indexed in organic results. To do this, avoid that this budget is wasted by crawling unnecessary pages for your SEO.


Why do search engines allocate a crawl budget to websites? Search engines do not have unlimited resources and must distribute their attention to millions of websites. So they need a way to prioritize their efforts to browse and explore the web. Allocating a crawl / crawl budget to each website helps them achieve this. How is the crawl budget allocated to websites? It depends on two factors: the crawl limit and the crawl demand. The crawl limit rate This rate aims, for the search engine, to establish a limit of pages to crawl at the same time for each site. If the search engine crawler had no crawl limit, it would crawl all the pages of a website simultaneously, which could overload the server and impact the user experience. Search engine crawlers are designed to avoid overloading a web server with requests,

Pages that load too slowly

which is why they pay attention to this aspect. But how do search engines determine a website’s crawl limit? Several factors come into play: A poor quality platform or server : how often the crawled pages return 500 errors (server) or take too long to load. The number of sites running on the same hosting : if your website is running on a hosting platform shared with hundreds of other websites, and you have a fairly large website, the crawl limit for your site web is very limited because it is determined at the server level. You must therefore share the exploration limit of the accommodation with all the other sites that run there. In this case, it is better to use a dedicated server, which will reduce the loading times for your visitors. The crawl request

The crawl / crawl request consists of determining the interest in re-crawling a URL. Basically, the search engine will identify if it should regularly visit certain pages of your site. Again, many factors influence crawl demand, including: Popularity : the number of internal links and backlinks pointing to a URL, but also the number of requests / keywords for which it is positioned. Freshness : how often the content of this web page is updated. The type of page : is it a type of page subject to change? Take for example a product category page and a terms and conditions page. Which one do you think changes the most often and deserves to be explored more frequently?

👍 Why is the crawl budget essential for your SEO?
The goal is to make sure that search engines find and understand as many of your indexable pages as possible, and that they do so as quickly as possible. When you add new pages and update existing pages, you probably want search engines to find them right away… Indeed, the faster they index the pages, the faster you can benefit in terms of SEO visibility!

⚠️ If you waste your crawl budget, search engines will not be able to crawl your website effectively. They will spend time on parts of your site that don’t matter, which can leave important parts of your site undiscovered. If they don’t know the pages, they won’t crawl and index them, and you won’t be able to attract visitors to them through search engines.

In short, wasting crawl budget hurts your SEO performance!

Remember: crawl budget is usually only a concern if you have a large website, say 10,000+ pages.

Now that we have covered the definition and the issues related to the crawl budget, let’s see how you can easily optimize it for your site.

meme crawl budget
✅ How to optimize your crawl budget?
Through this checklist, you should be able to have the right foundations for search engines to crawl your priority pages.

Simplify your site architecture
We recommend that you adopt a structure that is simple, hierarchical and understandable for your visitors and search engines. Therefore, prioritize your page levels by importance by organizing your site by page level and type:

Your home page as a level 1 page.
Category pages as level 2 depth pages (which can complement the pages generated by tags)
Content pages or product sheets (for e-commerce) as level 3 pages.
Of course, subcategories can be inserted between categories and content pages / product sheets through another level. But you understand the principle… the goal is to provide a clear, hierarchical structure for search engines, so that they understand which pages are to be crawled first.

Once you have made sure that you have established your downward hierarchy on your site through these page templates, you can organize your pages around common themes and connect them via internal links .

Watch for duplicate content
We consider as duplicated, the pages which are very similar, or completely identical in their content. These duplicate content can be generated by copied / pasted pages, results pages from the internal search engine or pages created by tags.

Coming back to the crawl budget, you don’t want search engines spending their time on duplicate content pages , so it’s important to avoid, or at least minimize, duplicate content on your site.

Here’s how to do it:

Set up 301 redirects for all variations of your domain name (HTTP, HTTPS, non-WWW, and WWW).
Make internal search results pages inaccessible to search engines by using your robots.txt file .
Use taxonomies like categories and tags with caution! Still too many sites use tags excessively to mark the subject of their articles, which generates a multitude of tag pages offering the same content.
Disable the pages dedicated to images. You know … the famous attached file pages that WordPress offers attached file
Manage your URL parameters
In most cases, URLs with parameters should not be accessible to search engines, as they can generate a virtually endless amount of URLs. URLs with parameters are commonly used when setting up product filters on e-commerce sites. It’s fine to use them, but make sure they’re not accessible to search engines!

As a reminder, this is often what a URL with a parameter looks like:

In this example, this page refers to the category of mascaras on the Lancôme site which are filtered by best sellers (this is indicated by? Srule = bestsellers).

How to make URLs inaccessible with parameters for search engines?

Use your robots.txt file to tell search engines not to access these URLs.
Add the nofollow attribute to the links corresponding to your filters. However, please note that as of March 2020 , Google can choose to ignore nofollow. The first recommendation is therefore to be favored.
Limit your low-quality content
Pages with very little content are not of interest to search engines. Keep them to a minimum, or avoid them altogether if possible. An example of poor quality content is an FAQ section with links to show questions and answers, where each question and answer is searchable through a separate URL.

Broken and incorrectly redirected links
Broken links and long redirect loops are dead ends for search engines. Much like browsers, Google seems to follow a maximum of five chain redirects in a single crawl (they can resume the crawl later). It is not clear how other search engines deal with redirect loops, but we recommend that you avoid chaining redirects altogether and limit the use of redirects in general.

Of course, it’s clear that by fixing broken links and redirecting them through 301 redirects, you can quickly recoup wasted crawl budget. In addition to recovering the crawl budget, you also significantly improve the visitor’s user experience. But redirect your pages that are really important to your business! This is because redirects, and chains of redirects in particular, lengthen page load times and thus adversely affect the user experience.

👉 To easily identify your pages in error responding in 410, 404 or worse… in soft 404, go to your Search Console through the Index -> Cover section then filter on Excluded.

search console errors

Note also that an SEO tool like Screaming Frog will also allow you to detect your pages in error.

Incorrect URLs in XML Sitemaps
All URLs included in XML sitemaps must be indexable pages. Search engines rely heavily on XML sitemaps to find all of your pages, especially on large websites. If your XML sitemaps are cluttered with pages that, for example, no longer exist or are redirected, you’re wasting your crawl budget. Regularly check your XML sitemap for non-indexable URLs that don’t belong there. Also do the opposite: look for pages that are wrongly excluded from the XML sitemap.

💡 The XML sitemap is a great way to help search engines spend their crawl budget wisely.

Our advice to optimize the use of your XML sitemaps

A best practice for crawl budget optimization is to split your XML sitemaps into several smaller sitemaps. For example, you can create XML sitemaps for each category of your website. This allows you to quickly determine if there are any sections of your website that are having problems.

Suppose your XML sitemap for Category A has 500 links and 480 are indexed: you are doing pretty well. But if your XML sitemap for Category B has 500 links and only 120 are indexed, that’s a problem you need to look into. You may have included a lot of non-indexable URLs in Section B’s sitemap.

Pages that load too slowly
When pages have a high load time or return an HTTP 504 response indicating an expired timeout while processing the request, search engines may visit fewer pages within your site’s budget. web for the crawl. Besides this inconvenience, the high load and wait times significantly affect the user experience of your visitors, resulting in a lower conversion rate.

Page load times longer than two seconds are a problem. Ideally, your page will load in less than a second. Regularly check your page load time using tools such as Pingdom , WebPagetest or GTmetrix .

💡 Note that you can also check your page speed through Analytics under the Behavior -> Site Speed ​​section, and in the Search Console through the Essential Web Signals section, also called Core Web Vitals , a new SEO ranking factor at from 2021.

In general, check regularly to see if your pages are loading fast enough, and if not, take action immediately. Fast page loading is essential to your success.

A high number of non-indexable pages
If your website has a large number of non-indexable pages that are accessible to search engines, you are just occupying the search engines by making them crawl irrelevant pages.

We consider these types of pages to be non-indexable:

Redirects (3xx)
Pages not found (4xx)
Pages with server errors (5xx)
Non-indexable pages (pages containing the <meta name = “robots” content = “noindex” /> tag or a canonical URL )
To easily identify these pages you can use Screaming Frog or, again, consult your Search Console in the Index -> Cover section and filter on Excluded.

A bad internal mesh
How the pages of your website relate to each other plays an important role in crawl budget optimization. This is what we call the internal mesh. Aside from backlinks, pages that have few internal links attract much less attention from search engines than pages that are linked by a large number of links.

Despite our first tip, avoid a too hierarchical link structure, too deep level pages having few links. In many cases, these pages will not be crawled frequently by search engines. Therefore, make sure that your most important pages get lots of internal links. Pages that have been recently crawled tend to rank higher in organic results. Keep this in mind and adapt your internal linking structure accordingly.

For example, if you have a blog post from 2010 that gets a lot of organic traffic, be sure to continue to link to that post from other content. Since you’ve produced many other blog posts over the years, the 2010 post is automatically placed at the bottom of your website’s internal link structure.

Don’t forget the PageRank!
Let’s go back in time Marty! In a 2010 interview between Eric Enge and Matt Cutts, the former head of Google’s webspam team, the relationship between page authority and crawl budget was brought up. Here’s what Matt Cutts explained in this interview:

“The number of pages we crawl is roughly proportional to your PageRank. So, if you have a lot of inbound links on your root page, we will definitely explore it. Your root page may then contain links to other pages that will get PageRank and which we will explore as well. However, as you get deeper into your site, PageRank tends to decrease. ”

Even though Google has abandoned the public updating of page PageRank values, PageRank is still used in their algorithms. As PageRank is a term sometimes misunderstood, let’s call it Page Authority. The big takeaway here is that Matt Cutts is basically saying that there is a pretty strong relationship between page authority and crawl budget.

👉 Therefore, to increase the crawl budget of your website, you must increase its authority (its PageRank). To do this, a large part of acquiring more links (backlinks) from external website

Leave a Reply

Your email address will not be published.