Recently asked Andy Halliday to let me know the Most Common issues identified during Server Log analysis and how to fix them.
Andrew is the go-to SEO for server log checks and here is what he had to say.
- 1 Andrew Halliday
- 2 Issues Identified By Server Logs
- 2.1 Redirect Loops / Chains
- 2.2 Dead Pages
- 2.3 Non Crawled sections
- 2.4 Slow Pages
- 2.5 Wasted Crawl Budget
- 2.6 5xx Errors
- 3 Summary
So I’ve spent what seems like my entire working life analysing server logs and it’s not far from the truth.
I must have done something wrong in a previous life, but the data held within those small log files hold some of the most important data you can analyse from a technical SEO perspective.
From optimising crawl budget to fixing redirect chains there is so much data.
It’s the ONLY place to find out what Googlebot is doing on your website.
Issues Identified By Server Logs
I want to share with you some of the most common issues I have identified over the years and how to fix them.
Redirect Loops / Chains
This is the big one, the main issue which can have massive impacts on your organic performance. Loops are far worse than chains but both will waste crawl budget and are both know to give signals to Googlebot to return less frequently to crawl your site.
Less crawls means new and update content takes longer to impact in the SERPs.
Google has also confirmed before that after so many steps in a chain they give up, usually around 15 steps – which is quite a lot but that’s 15 hits you have wasted.
These occur usually on older larger sites with multiple different developers over multiple years.
Another prime example of when redirect chains happen is during site migrations ether to a new domain or from HTTP to https.
How To Spot Redirect Chains
It’s quite easy to spot, redirect loops – look for response codes 508. To spot redirect chains, its a bit more difficult but can be done.
Redirect chains, export all the 3xx redirects – you could use a tool like Screaming Frog and crawl the list of 3xx URLs and they highlight any Redirect chains.
How To Fix Redirect Chains
Loops are simple, break the loop and land the user straight with one redirect to the final destination page.
For the chains, again – break the chain and point directly to the final destination.
While the fixes may seem simple and they are, it can be a lot of work initially to do all the analysis and break all the chains and loops, but Googlebot will love you in the long term so it’s well worth the investment.
Level of difficulty: Easy to Moderate.
A dead page is a URL that Google has tried to visit and got a 404 response code in return.
Whether they have followed an internal link or external link, is not important, it’s a wasted crawl from Googlebot’s point of view and 404 pages does use up some of your crawl budget.
Please note – that after fixing all internal dead links, there may be a short period where Googlebot still hits these URLs as they are in the queue system to crawl, however, these are rare.
How To Spot Dead Pages
Look in your log files for 404 response codes.
How to Fix Dead Pages
Fixing is a bit more difficult, firstly crawl your website with a tool like Screaming Frog, Ahrefs or SEMRush and find all the deadlinks on your site and remove the links.
If they are following from an external site these are a bit more difficult to fix. Firstly if you can not get the webmaster to change the links, then you will need to implement a 301 redirect, but finding the source of the link is the hard part. You can export all your backlinks from Tools like Google Search Console or Ahrefs, but they might not contain all your external links.
Tip: After 1 month clearing up all your deadlinks on your site, if 404’s are still appearing, add a redirect to land on a relevant page.
Difficulty Level: Easy
Non Crawled sections
If Googlebot doesn’t crawl a page or even a section then it can’t be found in the SERPs.
There can be multiple reasons why a page/section doesn’t get crawl, no internal links, no external links. By in large Googlebot doesn’t guess URLs so for it to crawl the section it needs to be told about the URL.
I’ve audited sites before where someone has created a wonderful detailed resources centre with 10x content but wasn’t getting any organic traffic.
After a bit of analysis – it became clear someone had forgotten to put in the footer a link to the new section and therefore there were no internal links to these pages.
Simple adding that one link allowed Googlebot to find all these resources and index them.
How to Spot Non Crawled Sections
The easiest way is to get three data sources, a crawl report, your log data and also a list from your developer of every page in the site, a fourth data point can be your sitemaps – but sometimes these can be wrong. If your website is built on WordPress, you could just export all posts and pages rather than asking a developer to give you a list.
Then simply all you want to do is a vlookup in excel and try and find the gaps.
How to Fix Non-Crawled Sections
A simple and effective way is to add more internal and where possible external links to these pages. Internal is the easiest way as this is something you can control.
Difficulty level: Medium – fixing it is pretty simple, but finding the full list initially to do the analysis can be tricky.
Slow loading pages affects more than just Googlebot, but as we are talking here about server log issues I will just refer to slow loading pages from a crawling point of view.
Googlebot is a busy bot, it wants to crawl the entire the web, so pages which are slow loading means they can’t and won’t crawl all your site.
How to Spot Slow Pages
Within your log files, there is a column which is called “Response Size” or something similar to this, the measurement number is in bytes.
Then order by the largest.
How to Fix Slow Pages
This is difficult there can be numerous reason why a page is slow loading, but now you have a list of slow loading pages you can start the manual work of identifying why its slow loading and then address the issues.
Difficulty Level: Advanced
Wasted Crawl Budget
Google doesn’t have an infinite amount of resources and crawling the web is super expensive and requires huge server farms. So each new site is given a budget and then Google automatically increases/decreases this budget over time depending on the site, its popularity among other factors.
So if you are wasting this limited resources you are only punishing your site and it can take a while to earn this trust back to increase your allocated budget.
Allowing Googlebot or any bot to crawl pages which are not necessary is a waste of their resources. This could be facetted navigation which creates a lot of near-duplicate pages across your site. Forum profile pages, PPC pages or low-value pages to just name a few other examples. These pages might be important to you so you need to decide on your site.
How to Spot Wasted Crawl Budget
The simplest way to spot is to analyse all the URLs that Googlebot is crawling and I usually like to add a count next to them.
Then go through the list and makes sure they are important, to save time in the future create a master list and do a vlookup and then simply analyse the new URLs.
How to Fix Wasted Crawl Budget
Robots.txt file – this is a super important file on your site and one which is rarely used to its most effectiveness.
If you are looking in your logs, this file is one of the most visited pages by Googlebot, they will crawl this several times a day to make sure you haven’t made any changes, but blocking them from crawling these pages at the source.
Difficulty level: Medium / Advance
While this might be the final issue I cover, it’s not the least important – in fact, 5xx errors are the most important.
If you spot any 5xx errors in your logs, this should ring a big red flag, but sometimes Google might get a 5xx error but because there is a server error, the log files might not be updated.
So just because you have no 5xx it doesn’t mean you don’t have issues. This is why 3rd party site upkeep monitoring tools are also important.
Google has also confirmed in the past that if the site struggles and repeatedly return 5xx errors then Googlebot will visit less frequently and your crawl budget and priority will suffer.
Keeping your website alive is key.
How to Spot 5xx Errors
Look in your log files for any 5xx response codes as well as using a third-party tool to monitor uptime. Google offers a very good service, but it’s not cheap – there are less frequent checks at a much more affordable price.
How to Fix 5xx Errors
Too difficult to try and assume why your site keeps going down. It’s up to you to try and determine this and then create an action plan of the back of your analysis.
Difficulty Level: Advanced
These issues are some of the most common issues I see when auditing server logs, it’s not the complete list, but each site is unique and therefore each audit is unique.
The good thing is, a Server Log Auditing Dashboard highlights the majority of these for you, for just $9.99 a month.
The key though is to get hold of your log files and to start analysing the data.