← Archive

Seamless web analytics without Javascript

by Nathan Pilkenton

August 2021

Summary: I wrote a couple simple bash scripts to work with GoAccess (a log analyzer), making it just as easy to get privacy-respecting and Javascript-free site analytics as it is to use a tool like Google Analytics. You can check the scripts out here, or read on for more on why I did this.

Site analytics without the Javascript

When I was building Chekkin, I needed some basic analytics to track how many people were coming to the site. Though I've used Google Analytics in the past, I decided not to go back to it for two reasons:

Searching revealed plenty of alternatives, but what ultimately caught my eye was GoAccess—it's free, open source, and since it works by analyzing log files, there's no need for any Javascript.

There were three major drawbacks with using GoAccess, compared to a Javscript tool like Google Analytics.

  1. GoAccess would require me to download logs from the server, and then run the tool—not nearly as easy as opening a web dashboard.
  2. Since my server rotates log files, I wouldn't be able to see stats from older than ~60 days.
  3. GoAccess has no built-in way to filter by date—it just includes everything in any of the log files you give it—so it's not easy to analyze traffic from only the last day, or week, or month.

Fortunately, all of these challenges were pretty easy to solve with a couple of bash scripts.

Archiving and combining log files

First, we'll need to get all the site log files we want to analyze. As mentioned above, my log files are rotated, but I didn't want to lose stats on old traffic.

As a solution, I wrote a short script to run daily and maintain one big log archive called combinedlogs.log. It concatenates the latest two rotating files with the existing big file, and then uses awk to strip out any duplicate rows–not elegant, but simple.

'log_archive.bash'
---
#!/bin/bash

mv combinedlogs.log oldlogs.log

cat /var/log/www.example.com.access.log /var/log/www.example.com.access.log.1 oldlogs.log | awk '!n[$0]++' > combinedlogs.log

rm oldlogs.log

Automatically pulling down the logs

Now, to actually analyze the logs. (If you're following along, and you haven't already, you'll need to install GoAccess.)

To simplify this, I wrote another bash script that runs locally on my machine. Conveniently, we can run the archive script from above to generate an up-to-date combined log file to analyze, and then download it:

'logs.bash'
---
#!/bin/bash

# run the log archive script from above
ssh [user]@[server] "bash log_archive.bash"

# download the latest combined log file
scp [user]@[server]:combinedlogs.log .

Filtering logs by date and running GoAccess

In order to filter for just more recent traffic, I extended the script from above. We'll take an optional argument: an integer for the number of days of history to analyze.

If we run the script with no argument, we'll assume we want to see all traffic. This section will run GoAccess on the combined logs and automatically launch the HTML report:

if [ $# -eq 0 ]; then

	goaccess combinedlogs.log --ignore-crawlers --anonymize-ip -o full_report.html --log-format=COMBINED

	open full_report.html

(Note: I've also passed two optional flags. --ignore-crawlers ignores traffic from some common bots, and --anonymize-ip "sets the last octet of IPv4 user IP addresses and the last 80 bits of IPv6 addresses to zeros" to provide some additional user privacy.)

On the other hand, if an argument is provided, we'll use grep to filter for only traffic that's happened within that number of days:

else

	for i in `seq 0 ${1:-8}`; do gdate -d "-$i days" +"%d/%b/%Y"; done | grep -f /dev/fd/0 combinedlogs.log >> recentlogs.log

	goaccess recentlogs.log --ignore-crawlers --anonymize-ip -o recent_report.html --log-format=COMBINED

	open recent_report.html

fi

That's it! Now, to see basic analytics, I can just run bash logs.bash from my project folder, and a nice HTML dashboard from GoAccess will pop up. And if I want to filter on only traffic from the past week, I can run bash logs.bash 7 .

Comparison with Google Analytics

Of course, this approach has tradeoffs. It certainly has some benefits, as I already discussed above:

But these benefits also come at a cost:

For now, I'm very happy with this solution. If traffic picks up to a point where I need the extra features, I'll probably switch to something like Simple Analytics. But until then, GoAccess is an awesome way to track basic site analytics, without compromising user privacy. 


← Archive