Computers, networks, and other IT systems generate records called audit trail records or logs that document system activities. Log analysis is the evaluation of these records and is used by organizations to help mitigate a variety of risks and meet compliance regulations. [Digital Guardian: What is Log Analysis 10/16/17]
My background into Log File Analysis can be rated as 1 on a scale of 10. I think every developer has one time or another delved into Log file Analysis at one time or another. For me, it certainly was not for SEO purposes, but for security reasons and system troubleshooting. This is a technique I have since steered away from as I find it quite intense and something I am not familiar with.
With that being said, here I am reviewing an article on how log files can help your SEO rankings.
The following review is based on a very detailed blog from Canonicalized.com, title: “Log File Analysis for SEO – Working with data visually” By Dorian Banutoiu, one of the founders of Canonicalized. This article is very long and is carefully arranged in clear and concise steps with a Quick Access Menu on the side if you want to quickly target a particular portion of the article (no scrolling necessary).
There are several factors that I discovered going through this comprehensive blog.
- Google uses two factors:
- Crawl rate limit
- Crawl demand
- Google provides a report on crawler activity inside Search Console
- But this report is not revelant
- URLs can and will be crawled multiple times
A Better Way
Instead, we should be looking at a chart like this:
Canonicalized has posted a link to a Live Example, so you can actually play around with it yourself.
You need to access your file logs because they contain records of all the requests that occur on the website. Their blog explains how you can find access logs using your Host Control Panel and how to group and extract only the pertinent data. They also explain why you will be looking at only Googlebot.
Log files are huge, and Canonlcalized explains how to prepare the data in a spreadsheet like R (Revolutions), although you can also use Excel. They use three libraries to process the stats: stringr, tidyr, and plyr. I am not familiar with these tools, but you can delve deeper into these libraries to understand more (links for these libraries are posted in the article).
The article suggests using Tableau Reader for this analysis. There are two parts to this process:
1) Hunting for errors
2) Looking at crawling patterns
This post further explains:
- Crawl Timeline
- The Treemap
- Type of crawler
- Type of content
Debugging for Errors
Response errors do affect your SEO, so in one click you have the possibility to filter down the requests that result in Error Response Codes.
You also have the opportunity to look at patterns over time by bot type, content type, by root folder, and so forth in the dashboard. The post further explains:
- Error Response Codes
- 403, 404, 410 errors
- 500: server issues
- Redirection response codes
- Request Size
Please read the actual article as it is much more comprehensive than my short overview.
There is a lot more data that I did not mention in this post, hopefully this review will stir your interest to find out more.
In my opinion these results show?
- What pages aren’t been crawled
- What content & resources are over crawled
- Improving your accessibility errors (404/500/etc)
- Find broken links
One thing I do know, is that every request made to your server for content is recorded in a log file.
So why is this important?
You can see exactly what search engines are crawling on your site.
Canonicalized realizes that most of you don’t have the time to perform log file analysis, so they are willing to help you.
“We are willing to help you with the data preparation and the dashboard setup.”
You just need to contact Canonicalized and send them your logs. For your convenience, they have a Live Chat Box at the bottom of the article.
Do you feel that your sites SEO has not improved by the normal measures?
Maybe you need to take the next step and have your log files analyzed.
What do you guys think about this?