Summary: I wrote a couple simple bash scripts to work with GoAccess (a log analyzer), making it just as easy to get privacy-respecting and Javascript-free site analytics as it is to use a tool like Google Analytics. You can check the scripts out here, or read on for more on why I did this.
When I was building Chekkin, I needed some basic analytics to track how many people were coming to the site. Though I've used Google Analytics in the past, I decided not to go back to it for two reasons:
Searching revealed plenty of alternatives, but what ultimately caught my eye was GoAccess—it's free, open source, and since it works by analyzing log files, there's no need for any Javascript.
There were three major drawbacks with using GoAccess, compared to a Javscript tool like Google Analytics.
Fortunately, all of these challenges were pretty easy to solve with a couple of bash scripts.
First, we'll need to get all the site log files we want to analyze. As mentioned above, my log files are rotated, but I didn't want to lose stats on old traffic.
As a solution, I wrote a short script to run daily and maintain one big log archive called combinedlogs.log
. It concatenates the latest two rotating files with the existing big file, and then uses awk
to strip out any duplicate rows–not elegant, but simple.
'log_archive.bash' --- #!/bin/bash mv combinedlogs.log oldlogs.log cat /var/log/www.example.com.access.log /var/log/www.example.com.access.log.1 oldlogs.log | awk '!n[$0]++' > combinedlogs.log rm oldlogs.log
Now, to actually analyze the logs. (If you're following along, and you haven't already, you'll need to install GoAccess.)
To simplify this, I wrote another bash script that runs locally on my machine. Conveniently, we can run the archive script from above to generate an up-to-date combined log file to analyze, and then download it:
'logs.bash' --- #!/bin/bash # run the log archive script from above ssh [user]@[server] "bash log_archive.bash" # download the latest combined log file scp [user]@[server]:combinedlogs.log .
In order to filter for just more recent traffic, I extended the script from above. We'll take an optional argument: an integer for the number of days of history to analyze.
If we run the script with no argument, we'll assume we want to see all traffic. This section will run GoAccess on the combined logs and automatically launch the HTML report:
if [ $# -eq 0 ]; then goaccess combinedlogs.log --ignore-crawlers --anonymize-ip -o full_report.html --log-format=COMBINED open full_report.html
(Note: I've also passed two optional flags. --ignore-crawlers
ignores traffic from some common bots, and --anonymize-ip
"sets the last octet of IPv4 user IP addresses and the last 80 bits of IPv6 addresses to zeros" to provide some additional user privacy.)
On the other hand, if an argument is provided, we'll use grep
to filter for only traffic that's happened within that number of days:
else for i in `seq 0 ${1:-8}`; do gdate -d "-$i days" +"%d/%b/%Y"; done | grep -f /dev/fd/0 combinedlogs.log >> recentlogs.log goaccess recentlogs.log --ignore-crawlers --anonymize-ip -o recent_report.html --log-format=COMBINED open recent_report.html fi
That's it! Now, to see basic analytics, I can just run bash logs.bash
from my project folder, and a nice HTML dashboard from GoAccess will pop up. And if I want to filter on only traffic from the past week, I can run bash logs.bash 7
.
Of course, this approach has tradeoffs. It certainly has some benefits, as I already discussed above:
But these benefits also come at a cost:
--ignore-crawlers
flag in GoAccess does a decent job, but it seems like a few bots slip past)For now, I'm very happy with this solution. If traffic picks up to a point where I need the extra features, I'll probably switch to something like Simple Analytics. But until then, GoAccess is an awesome way to track basic site analytics, without compromising user privacy. ⧈