Common Issues During Server Log Analysis

Server Log Analysis SEO Audit

Recently asked Andy Halliday to let me know the Most Common issues identified during Server Log analysis and how to fix them.

Andrew is the go-to SEO for server log checks and here is what he had to say.

Andrew Halliday

So I’ve spent what seems like my entire working life analysing server logs and it’s not far from the truth.

I must have done something wrong in a previous life, but the data held within those small log files hold some of the most important data you can analyse from a technical SEO perspective.

From optimising crawl budget to fixing redirect chains there is so much data.

It’s the ONLY place to find out what Googlebot is doing on your website.

Issues Identified By Server Logs

I want to share with you some of the most common issues I have identified over the years and how to fix them.

Redirect Loops / Chains

This is the big one, the main issue which can have massive impacts on your organic performance. Loops are far worse than chains but both will waste crawl budget and are both know to give signals to Googlebot to return less frequently to crawl your site.

Less crawls means new and update content takes longer to impact in the SERPs.

Google has also confirmed before that after so many steps in a chain they give up, usually around 15 steps – which is quite a lot but that’s 15 hits you have wasted.

These occur usually on older larger sites with multiple different developers over multiple years.

Another prime example of when redirect chains happen is during site migrations ether to a new domain or from HTTP to https.

How To Spot Redirect Chains

It’s quite easy to spot, redirect loops – look for response codes 508. To spot redirect chains, its a bit more difficult but can be done.

Redirect chains, export all the 3xx redirects – you could use a tool like Screaming Frog and crawl the list of 3xx URLs and they highlight any Redirect chains.

How To Fix Redirect Chains

Loops are simple, break the loop and land the user straight with one redirect to the final destination page.

For the chains, again – break the chain and point directly to the final destination.

While the fixes may seem simple and they are, it can be a lot of work initially to do all the analysis and break all the chains and loops, but Googlebot will love you in the long term so it’s well worth the investment.

Level of difficulty: Easy to Moderate.

Dead Pages

A dead page is a URL that Google has tried to visit and got a 404 response code in return.

Whether they have followed an internal link or external link, is not important, it’s a wasted crawl from Googlebot’s point of view and 404 pages does use up some of your crawl budget.

Please note – that after fixing all internal dead links, there may be a short period where Googlebot still hits these URLs as they are in the queue system to crawl, however, these are rare.

How To Spot Dead Pages

Look in your log files for 404 response codes.

How to Fix Dead Pages

Fixing is a bit more difficult, firstly crawl your website with a tool like Screaming Frog, Ahrefs or SEMRush and find all the deadlinks on your site and remove the links.

If they are following from an external site these are a bit more difficult to fix. Firstly if you can not get the webmaster to change the links, then you will need to implement a 301 redirect, but finding the source of the link is the hard part. You can export all your backlinks from Tools like Google Search Console or Ahrefs, but they might not contain all your external links.

Tip: After 1 month clearing up all your deadlinks on your site, if 404’s are still appearing, add a redirect to land on a relevant page.

Difficulty Level: Easy

Non Crawled sections

If Googlebot doesn’t crawl a page or even a section then it can’t be found in the SERPs.

There can be multiple reasons why a page/section doesn’t get crawl, no internal links, no external links. By in large Googlebot doesn’t guess URLs so for it to crawl the section it needs to be told about the URL.

I’ve audited sites before where someone has created a wonderful detailed resources centre with 10x content but wasn’t getting any organic traffic.

After a bit of analysis – it became clear someone had forgotten to put in the footer a link to the new section and therefore there were no internal links to these pages.

Simple adding that one link allowed Googlebot to find all these resources and index them.

How to Spot Non Crawled Sections

The easiest way is to get three data sources, a crawl report, your log data and also a list from your developer of every page in the site, a fourth data point can be your sitemaps – but sometimes these can be wrong. If your website is built on WordPress, you could just export all posts and pages rather than asking a developer to give you a list.

Then simply all you want to do is a vlookup in excel and try and find the gaps.

How to Fix Non-Crawled Sections

A simple and effective way is to add more internal and where possible external links to these pages. Internal is the easiest way as this is something you can control.

Difficulty level: Medium – fixing it is pretty simple, but finding the full list initially to do the analysis can be tricky.

Slow Pages

Slow loading pages affects more than just Googlebot, but as we are talking here about server log issues I will just refer to slow loading pages from a crawling point of view.

Googlebot is a busy bot, it wants to crawl the entire the web, so pages which are slow loading means they can’t and won’t crawl all your site.

How to Spot Slow Pages

Within your log files, there is a column which is called “Response Size” or something similar to this, the measurement number is in bytes.

Then order by the largest.

How to Fix Slow Pages

This is difficult there can be numerous reason why a page is slow loading, but now you have a list of slow loading pages you can start the manual work of identifying why its slow loading and then address the issues.

Difficulty Level: Advanced

Wasted Crawl Budget

Google doesn’t have an infinite amount of resources and crawling the web is super expensive and requires huge server farms. So each new site is given a budget and then Google automatically increases/decreases this budget over time depending on the site, its popularity among other factors.

So if you are wasting this limited resources you are only punishing your site and it can take a while to earn this trust back to increase your allocated budget.

Allowing Googlebot or any bot to crawl pages which are not necessary is a waste of their resources. This could be facetted navigation which creates a lot of near-duplicate pages across your site. Forum profile pages, PPC pages or low-value pages to just name a few other examples. These pages might be important to you so you need to decide on your site.

How to Spot Wasted Crawl Budget

The simplest way to spot is to analyse all the URLs that Googlebot is crawling and I usually like to add a count next to them.

Then go through the list and makes sure they are important, to save time in the future create a master list and do a vlookup and then simply analyse the new URLs.

How to Fix Wasted Crawl Budget

Robots.txt file – this is a super important file on your site and one which is rarely used to its most effectiveness.

If you are looking in your logs, this file is one of the most visited pages by Googlebot, they will crawl this several times a day to make sure you haven’t made any changes, but blocking them from crawling these pages at the source.

Difficulty level: Medium / Advance

5xx Errors

While this might be the final issue I cover, it’s not the least important – in fact, 5xx errors are the most important.

If you spot any 5xx errors in your logs, this should ring a big red flag, but sometimes Google might get a 5xx error but because there is a server error, the log files might not be updated.

So just because you have no 5xx it doesn’t mean you don’t have issues. This is why 3rd party site upkeep monitoring tools are also important.

Google has also confirmed in the past that if the site struggles and repeatedly return 5xx errors then Googlebot will visit less frequently and your crawl budget and priority will suffer.

Keeping your website alive is key.

How to Spot 5xx Errors

Look in your log files for any 5xx response codes as well as using a third-party tool to monitor uptime. Google offers a very good service, but it’s not cheap – there are less frequent checks at a much more affordable price.

How to Fix 5xx Errors

Too difficult to try and assume why your site keeps going down. It’s up to you to try and determine this and then create an action plan of the back of your analysis.

Difficulty Level: Advanced

Summary

These issues are some of the most common issues I see when auditing server logs, it’s not the complete list, but each site is unique and therefore each audit is unique.

The good thing is, a Server Log Auditing Dashboard highlights the majority of these for you, for just $9.99 a month.

The key though is to get hold of your log files and to start analysing the data.

James Dooley

About James Dooley

Digital Nomad who loves travelling the world networking while working on my laptop. Life is a perception of your own reality. You have no excuses and should be making memories every single day #LearnSomethingNew #Develop #Synergy #Network

Leave a Reply