What Is A Server Log File?
.table-o-contents {
margin-right: 2em;
margin-left:0;
margin-bottom:2em;
}
.table-o-contents li {
display:block;
text-align: left;
margin: 0;
list-style:none;
transition: all 0.5s ease
font-size:1.2rem;
line-height:1rem;
}
.table-o-contents li:hover {
background-color: navy;
transition: all 0.3s ease;
}
.table-o-contents a,
.table-o-contents a:visited {
color:#333;
display:block;
width:100%;
height:100%;
text-decoration: none;
font-face: sans-serif;
}
.table-o-contents a:hover,
.table-o-contents a:focus {
color: #fff;
}
@media screen and ( max-width:767px ) {
.table-o-contents {
float:none;
}
}
- Contents
- Log File Contents
- Breaking Down A Log Entry
- Getting The Logs
- What You Can Learn (And What You Can’t)
- My Favorite Log Analysis Tools
If you already know what a log file is, and want to learn how to analyze them for SEO, check out Log File Analysis For SEO.
Server log files are a raw, unfiltered look at traffic to your site. They’re text files stored on your web server. Every time any browser or user-agent, Google included, requests any resource—pages, images, javascript file, whatever—from your server, the server adds a line in the log.
That makes log files giant piles of juicy data.
If you already know how they work and want to analyze them, read my post about log file analysis for SEO. If you know that, too, get a cup of coffee and take care of all those emails in your inbox. You don’t need to read this.
Log File Contents
Here’s a line from Portent’s server log. I edited it to simplify a bit:
11.222.333.44 - - [11/Dec/2018:11:01:28 –0600] “GET /blog/page-address.htm HTTP/1.1” 200 182 “-” “Mozilla/5.0 Chrome/60.0.3112.113”
If you want to know the basics, you can read about the Common Log Format here.
The last bit that starts with “Mozilla” is the user agent. It’s important if you’re analyzing the log file for SEO, or to see what software is accessing your site, or to troubleshoot a specific server problem. The user agent is the type of browser or other software that’s accessing your site. If Googlebot requests a resource, you’ll see a user agent string that includes “GoogleBot.” If Bingbot hits your site, you’ll see a user agent string that includes “BingBot.”
Breaking Down A Log Entry
Here’s the example again:
11.222.333.44 - - [11/Dec/2018:11:01:28 –0600] “GET /blog/page-address.htm HTTP/1.1” 200 182 “-” “Mozilla/5.0 Chrome/60.0.3112.113”
On December 11, 2018, someone using Google Chrome tried to load https://www.portent.com/blog/page-address.htm. The ‘200’ means the server found the file (yay!). Page-address.htm is teeny, weighing in at 182 bytes.
The IP address of the client—the software that requested the file was—11.222.333.44. I put that last because for many reasons it’s not terribly helpful to us marketers.
Again: Every request from every user agent is a line in the log file. Every request. Every. Single. One.
Getting The Logs
That’s the rub. Some technical teams cling to log files, citing security concerns. Some site platforms hide log files so deep in their twisted innards finding them requires an electronic colonoscopy.
But the log files are there. They’re not a security risk. The site developer can zip them up and send them to you. Buy beers, bring chocolate, do whatever you need to do to make friends. Then ask.
If the files are gigantic, ask for a snapshot. A few days or even a few hours is a good start.
What You Can Learn And What You Don’t Need
Log files are data lasagna. They’re yummy. They’re substantial. And they’ll put you to sleep if you overindulge.
I use them to find:
- Spider traps. Log files give you a great look at how search bots are crawling your site.
- Spam content. If some hacker dumped a bunch of pages listing porn links on your site, any clicks to those pages appear in the log files.
- Broken external links. Google eventually gives up crawling broken links from other sites. But people still click them. Track down those busted external links and reclaim some authority.
- Incorrect server responses. Google Search Console can show some soft 404 errors, but the log file shows you all of them.
You can’t use them to:
- Get keyword data. Keyword data isn’t just an analytics software problem. Not provided means you can’t find search terms here, either.
- Track user sessions (usually). Most user session tracking requires javascript. Use a tool like Google Analytics, instead.
- Track individual user. In theory, you could track visits from Ian Lurie. But it would require much mind-numbing labor.
- Track rendering times. Log files show requests for resources. They don’t track what happens after the request. If a page renders incorrectly or slowly, it won’t show up here.
You don’t need to use them to:
- Track conversions. Conversion tracking in log files is like sitting on your tongue. Feasible, but not recommended.
- Analyze geographic data. You can, but most analytics software shows location data and requires a lot less work.
- Track click paths through your site. Again, possible, but you can get the data more easily from your analytics software.
My Favorite Log Analysis Tools
Screaming Frog Log File Analyser is my number one choice. It’s a great combination of power and usability, and you can merge log file data with crawls and other data sources.
Splunk is so powerful it terrifies me. But it’s great for managing giant log files in near-real-time.
Apache Log Viewer is free. It has a steeper learning curve than Screaming Frog, but, you know, free.
Log files don’t provide conversion data or session data. That kind of tracking requires cookies and a client-side analytics suite, like Google Analytics. They do provide a record of every website resource requested from every browser and user-agent. That makes them very powerful.
Ian, WTF?
I usually write metaphor-stuffed rants. This is more Wikipedia style because I write about log files all the time. This seems more efficient than adding a “what’s a log file?” section to every post.
The post What Is A Server Log File? appeared first on Portent.