Ever since I started this personal blog site, I was curious if people actually read what I write. Luckily, based on the responses I received on Twitter, LinkedIn and in private, there is no problem with that. Next I wanted to see numbers. I was told, that Google Analytics is the gold standard of measurement. Well…
Lets start it with the basic problem: even my own visits are not counted. The reason is simple: uBlock Origin. I need to use my tablet to get my visit counted, the only device where I do not use an ad blocker. According to Google Analytics, my most popular blog is about listening to music, while my IT and security related blogs are barely read by anyone. When I check the raw logs, the picture is quite different. My estimate is, that depending on the topic, 20 to 80 percent of visitors fly under the radar, when it comes to Google Analytics.
Once upon a time I used webalizer to analyze my logs. Awffull is a fork of webalizer, but also has been dead for a long time. But while 20 years ago its output was considered to be rich and beautiful, it is like a time capsule now. A bit of nostalgia, but otherwise not much useful. It includes all results, including search and other bots.
Last week I asked around what should I use to replace Google Analytics. Quite a few people suggested that I keep using GA, as even if it is not much use, it is still the gold standard. However it is a personal blog without any ads. It is not a business site and I am more curious about real usage than how many ads I can serve.
Another frequent suggestion was Matomo. It is available both on-premise and as a cloud service. When used from the cloud it has the same problems as GA. Probably a bit more accurate results, but still blocked by ad blockers. And some posts suggest that on-premise installations are also effectively blocked.
To a lesser extent, but it seems to have the same problem as Matomo and GA.
I plan to experiment a bit. I might even try Matomo and/or Plausible. But first I plan to setup syslog-ng with Elasticsearch and Kibana, and see, what I can do with the raw logs myself. A couple of ideas:
- syslog-ng can parse Apache access log and store the results in Elasticsearch
- based on the User-Agent I can label some traffic as RSS, search engine and probably a few more categories
- probably the closest to the truth in terms of human visitors: check CSS downloads with a page referrer
I hope that I’ll learn not just about my website traffic, but also more about syslog-ng, Elasticsearch and Kibana. And as many of my friends are in information security, working with raw logs promises to be the most effective.
If you have any suggestions, you can reach me on Twitter or LinkedIn (links in the upper right corner).