Google Bot Crawling Websites
If you are new to search engine optimisation and not sure what spiders or Google bot crawlers are then research this before carrying on with this article. When carrying out a Google search you are not searching the world wide web. You are actually searching Google’s index of the web. Or at least as much of it as Google can physically find. This is carried out with software programs which are known as spiders.
Spiders start by fetching a few websites and then they follow the backlinks on those pages. Once hitting those pages they then follow the links on those webpages until it has indexed as much of the internet as possible. I can tell you one thing for sure that in any industry, crawl budget is something we can and certainly should optimise for SEO success. Here is a great article that explains how to optimise your crawl budget but below will explain this also.
What is a Crawl Budget?
A Crawl budget is the time Google spends on your website. Or you can also describe it as the number of pages Google allocates to crawl on the site. Google bot crawl budget is the number of pages Google will crawl on your site on any day. This number varies slightly from day to day, but overall it’s relatively stable. The number of pages Google crawls, your “budget”, is generally determined by the size of your site, the “health” of your site (how many errors Google encounters) and the number of links to your site.
Google doesn’t always spider every page on a site instantly. In fact, sometimes it can take weeks. This might get in the way of your SEO efforts. Your newly optimised landing page might not get indexed. At that point, it becomes time to optimise your crawl budget. Indexing pages is certainly the most important part of SEO because without a page indexed means it cannot be seen anywhere in the search results. It might crawl 4 pages a day, it might crawl 8,000 pages, it might even crawl 5,000,000 pages every single day but this all depends on many factors to do with your site.
How Does a Crawler Work?
A crawler like Googlebot gets a list of URLs to crawl on a site. It goes through that list systematically. It grabs your robots.txt file every once in a while to make sure it can still crawl each URL, and then crawls the URLs one by one. Once a spider has crawled a URL and it has parsed the contents, it adds new URLs it has found on that page that it has to crawl back on the to-do list.
Several events can make Google feel it needs to crawl a URL. It might have found new links pointing at content, or someone has tweeted it, or it might have had an update in the XML sitemap, etc etc… There’s no way to make a list of all the reasons why Google would crawl a URL, but when it determines it has to, it adds it to the to-do list.
Crawl Budget Issue Checks?
To quickly determine whether your site has a crawl budget issue, follow the steps below. This will check whether the Google bot is taking too long to try and index all your pages:
- Determine how many pages you have on your site, the number of your URLs in your XML sitemaps might be a good start.
- Go into Google Search Console.
- Go to Crawl -> Crawl stats and take note of the average pages crawled per day.
- Divide the number of pages by the “Average crawled per day” number.
- If you end up with a number higher than 10 (so you have 10x more pages than what Google crawls each day), you should optimise your crawl budget. If you end up with a number lower than 3 this is perfect.
Every SEO specialist should be looking at these steps. If a client’s website or one of your own money sites has plenty of blog posts then perhaps re-purposing this and 301 redirecting to strengthen a money page is the option. This will remove any Content Cannibalization issues your site might have. But make sure you do not remove any pages which are ranking or bringing you traffic. Also you need to make sure you have plenty of supporting articles to theme the topic of your site.
Server Log Check
Depending on your type of hosting, you might not always be able to grab your log files. However, if you even so much as think you need to work on crawl budget optimisation because your site is big, you should get them. If your host doesn’t allow you to get them, change hosts.
You really should know which URLs Google is crawling on your site. The only “real” way of knowing that is looking at your site’s server logs. When you realise which pages constantly get crawled then start to open gateways to other pages with a real silo stucture strategy on your website to help the bots index the site more easily. As a rule of thumb I love to use the testing method that no matter what page the Googlebot lands on my site then within 4 clicks on contextual links they can land on any page on the website. This will pass link juice throughout the website and help with indexing it all.
Fixing your site’s crawl budget is a lot like fixing a car. You can’t fix it by looking at the outside, you’ll have to open up that engine. Looking at logs is going to be scary at first. You’ll quickly find that there is a lot of noise in logs. You’ll find a lot of commonly occurring 404s that you think are nonsense. But you have to fix them. You have to get through the noise and make sure your site is not drowning in tons of old 404s.
The more times these Googlebots come to a 404 error page the less likely they will index all your webpages. Use screaming frog to get check for any problems and then fix the issue as soon as possible. Make sure that the pages that are crawled return one of two possible return codes: 200 (for “OK”) or 301 (for “Go here instead”). All other return codes are not OK. To figure this out, you have to look at your site’s server logs or run a test. Get maintaining your website and giving a regular service.
Block Parts of The Site
If you have sections of your site that really don’t need to be in Google, block them using robots.txt. Only do this if you know what you’re doing, of course. But a common example of this is indexing your tags and category pages on your blog section. These usually are lists of your articles with zero unique content and just start to add the amount of pages with title tags like “Blog Page 6” which means absolutely nothing.
Simply no-index these pages to allow more time for the Googlebots to index your main articles you want to rank. Therefore you won’t be wasting your Crawl Budget on pages which don’t need to rank for anything.
The dynamically created pages like tags, image URLs and duplicated category pages need to all be no-index. This then will allow more time on site for the Googlebots to re-crawl your money pages. More crawls on a specific page = better rankings in the search engines. People massively overlook this and you need to start looking at your CrawlRank more effectively.
Link Building to Inner Pages
Trying to get backlinks can be a difficult task. But if you can create multiple backlinks to inner pages this will certainly help. The reason for this is because you will be sending more Googlebots to the site and also get them starting in different areas of the area. Hopefully then with the silo of the site should index them all.
The more links you can get which is quality of course the better. There is no point trying to build thousands of spammy links though because the crawlers do not even entertain the zero link juice backlinks so this would be irrelevant.
Freshness to Websites
Freshness of content on your website and freshness of links allows crawlers to keep returning. The more Googlebot spiders to reach your site the better your authority will be. Keep re-purposing old content, adding more value and then share these throughout your social profiles. Reach out to bloggers and reporters to see if they have any articles relevant to the website and if they can backlink to the website then happy days.
Keeping freshness of content sends the Google bots to your site. Pages that have been crawled recently typically get more visibility in the SERPs. In other words, if a page hasn’t been crawled in a while, it won’t rank well. Therefore, if your pages haven’t been updated in a while, they won’t get crawled, so they won’t rank as well. The SEO technique is to try and get the crawlers back to your site as many times as possible.
Important To Understand
It is really important to understand the need for getting your site crawled as many times as possible. Therefore make sure with the silo you are making sure your money pages are certainly receiving links regularly. But also make sure you have a link building strategy to backlink to inner pages.
More crawls on a page = better rankings
Please leave a comment below if you have anything to add onto this article. We hope you enjoyed it and feel free to share it with others if you found it interesting.
All Technical SEO Related Posts
Check out the in-depth list of technical seo posts.
- A Fair Comparison Between SEO Crawlers: JetOctopus and Screaming Frog
- Common Issues During Server Log Analysis
- Technical SEO Checklist
- Website Architecture
- Google Bot Crawl Budget Optimisation
- Silo Structure Internal Linking
The full list shows the various technical seo strategies for ranking higher in Google SERPs.