Site Logs Voyeurism
Web Design • April 27th, 2006Although there are several fine and elegant applications that provide webmasters with consolidated access statistics for their web sites, I find it impossible to live without my own site logs analysis routine. I have become a shameless voyeur of my sites’ access logs, and I find that this practice is good for business and also for some of my own “evil” purposes. Here I’m sharing my analysis routine and some examples of how I use site logs in business and life.
Daily Site Logs Analysis
First, I must say that my analysis routine is completely manual. One of the few advantages of not having a site with overwhelming traffic is that I still can observe the activity with as much detail as I want. Some of the things I do could not scale up should my site have the kind of traffic that web celebrities enjoy. But I’m pretty sure that a lot of it could be – or already has been – automated, and I would still find value in looking at the raw data for ad-hoc analysis and specific tests. For now, the whole process takes me a little less than 30 minutes, and to me, this time is well invested:
1. Download the latest access log from the server
Media Temple rotates logs every 24 hours, so for lack of a script, I have to remind myself of downloading the file every day. Media Temple provides two access log files and two error log files. One of the files contains all records for the last 24 hours up until the rotation time (somewhere between 3:00 and 4:00 am). The other file contains real time data. So, I use this last one whenever I need to see what is happening on the site right now or during the course of the current day. For the daily analysis I use the historical log.
2. Enter the log data into my “sophisticated” Excel spreadsheet
First, I need to use Excel’s “Text to columns” command to split the text file into the columns I need. Then, the spreadsheet generates more columns that allow me to see clean fields with auto filters for the most important data: REQUEST, REFERRER, BROWSER AND OS, DATE, and IP ADDRESS.
3. Filter the relevant data
I keep all data for a month in the same file, so for the daily analysis, I use Excel filters to limit the data to the last day, and to remove my own activity on the site (of course, by IP address).
4. Get data on referrals and search engine finds
Once again I filter the data to remove empty referrer fields or fields containing my own domain. I want to get to only those transactions that mark the entry of a user into my site after following a link existing out of my domain.
I copy this set of records into a second “very sophisticated” spreadsheet, which adds a column to give me clean unique fields for the requested pages (e.g. “/work/zuk/”, instead of “GET /work/zuk/ HTTP/1.0”). This makes my life easier when I want to filter the data to any specific page.
In this spreadsheet I also track an extra column for search strings and my approximate page ranking for each. Sadly, I do this manually, although many site stats applications make it much easier. Again, I’m a voyeur of my site, and the manual work helps me realize and remember patterns, for instance: a tremendous amount of hits to MQStudio come from users searching for “plantillas css”. So, of course I’m currently exploring business strategies based on that knowledge.
5. Catch suspicious activity
Back to the main spreadsheet with all daily activity, a quick filter by browser/OS lets me zero on visits by abominable creatures like offline browsers. Not all of them show their true identity when going through my site, but the few that do, I can see immediately. Same goes for new robots crawling the site. Many times I’ll check them out and write appropriate rules for them in my site’s robots.txt file, and if needed: in .htaccess.
6. Inspect user paths through the site
This is the most tedious piece of the process, but since it still is manageable for me –at least on a daily basis– I spend the time to filter the data by each IP address. That way I can take a quick glance at the different user paths through my site in order to:
- Catch suspicious activity (e.g. defacing attempts, potential copyright violations, spammers, etc.)
- Conduct my own usability tests
- Observe interesting patterns (e.g. pages or search strings that get users on the site, but don’t do well at keeping them in)
7. Inspect error logs
Error logs make suspicious activity even more visible. Tracking errors also lets me identify recurring users performing questionable requests at my site. If I can identify that a little devil keeps coming with the same IP address, it gets blocked via .htaccess.
How site logs data helps my business
One reason why I still do all this manually instead of leaving it up to Urchin or to the very classy Mint, is that I like to keep all my site’s historical data in a format that I understand and that has been designed to answer my specific questions.
Search Engine Optimization
Analyzing, and keeping a personal database of referrer records allows me to keep track of my site’s SEO status. Even better, this data proved invaluable last year when I completely redesigned my portfolio web site, including copy, and I wanted to make sure that the new meta code and content for any given page would improve, or at least keep, the page’s ranking for key search queries.
Site logs help me understand my users, and the kinds of things they’re looking for when they enter my site. Following their paths also gives me a hint on whether they’re finding what they want or at least something else that keeps them browsing. I can tell and attempt to measure the relative interest in each of my portfolio samples… and with this data I can also dare to make hypothesis on things like the kinds of images that trigger more click-throughs. I can test those theories and conduct my own usability studies.
Running my own usability tests
A few weeks before launching my site redesign, I sent the beta url to a few people whose opinion I wanted to gather. My designer friends replied with very specific comments ala “design review” which are always the kind of feedback I seek particularly with personal projects. Most of all other comments provided praise or criticism, also great to know, but in many cases they didn’t offer enough information.
I had specific questions, and didn’t want to bother willing volunteers with an infinite chain of emails. Some questions related to my own doubts about certain aspects of the design. Others were triggered by comments from colleagues. I agreed immediately with some of their opinions, but others I wanted to validate with more users. This is where site logs came to the rescue. I think this process was interesting and it was definitely successful at providing me with more information to help me make design decisions. But this post is getting long, so I’ll leave this topic for more detail in my next post.
Getting insight into prospective clients
Increasingly lately, every time a prospective client contacts me for the first time, I take a look at my site logs before sending a reply. Thanks to the real-time access log on my server, I can look up the IP address immediately and observe the way how the client came to my site, the amount of time he or she spent looking at my portfolio before contacting me, whether the client read my About page or not, etc.
This information gives me a hint on whether the client is genuinely interested in my work, or just merely shopping for quotes. And that’s good to know so that I don’t spend too much energy on unlikely prospects, and I focus more effort on clients with whom I have a real chance to get the deal. It’s also nice to work with somebody who respects your work and I find evidence of that on clients that thoroughly go through my portfolio, inspect my site, and maybe who I am, and THEN contact me.
If the prospect becomes a real project, site logs keep me informed on the client’s browsing environment (so I make sure that intermediate work always looks good in their browser and operating system). If the client hasn’t replied in a reasonable time after I’ve posted work for review, I check site logs to see if he or she has seen the work or not. I usually ask clients what they like in my portfolio before starting to design for them, and watching the client’s path through my portfolio samples also gives me clues as to the style they seem to prefer.
Catching bad behavior
The first person that I caught ripping-off one of my designs was found without much delay thanks to site logs. She left her permanent print on them by clicking on a link to my site still not edited in her working copy. Not every copy-cat is this careless, but by keeping an eye on my site logs I have identified defacing attempts, spammers, users downloading my whole site with offline browsing tools, etc. I seriously despise these kind of behavior, and do all I can to kick it out of my site. Site logs and a few other “evil devices” keep me informed.
Site Logs Used for Personal Missions
Yes. I’ve used my logs for a few personal voyeur purposes…
Biting my nails while selling our house
I’m a web designer, so of course my beautiful house in Dallas had an online photo gallery to help sell herself. We couldn’t post a link to it from the MLS database, so the best I could do was to publish the link on our also gorgeous house flyer. I’d check site logs for visitors to the gallery, hoping to find a few recurring visitors, which would tell us an offer might be coming.
Checking out who saw our Christmas greeting card
This is a little much, I know… But it was fun.
Last Christmas, Joey and I sent a Flash slide show card summarizing our 2005 as a Christmas and New Year’s greeting to all of our friends and family. Being the evil witch I am, I used PHP to display the same card with a different URL for each recipient.
The purpose was twofold: 1) I wanted to pass dynamically a personalized greeting message to each recipient in the last slide of the movie. 2) I wanted to track delivery of each card individually through site logs.
The different URLs (e.g. “/2005card/?to=luli”) generated individual entries on my site log, so I could see if Luli had seen the card, when she head seen it, if she saw the whole thing, how many times, etc. Of course, if Luli didn’t reply to our card but the site logs told me she saw it, or if she quit the show before it was over, she’d be put in my lista de agravios.
Conclusion
Well… I’ve managed to spend quite some time honoring my beautiful site logs. I seriously could not conceive life without them. The raw data is of course much harder to deal with when your site’s traffic spikes up with your design posted on the first page of the CSS Zen Garden. But I find the data beautiful, and sooo helpful. And in my previous life, I made a living out of making cumbersome spreadsheet models, so, this method is still pretty good for me.
I’d like to share my experience using site logs as a free usability testing vehicle later, in a separate post. This post has to end now.