Sorry, we could not find the combination you entered »
Please enter your email and we will send you an email where you can pick a new password.
Reset password:


By Thomas Baekdal - July 2011

Bots vs. Real People vs. Blockers

You can do many things to make your analytics more accurate. You can add campaign variables to your links. You can add special analytic triggers to your content. And, you can create custom reports that narrow in on real outcomes like your conversion rates.

But it doesn't change the fact the the raw numbers are inaccurate. The question is just "by how much."

We see this every day with newspapers and paywalls. When newspapers ask people if they would pay for it, about 5% say that they will. But the actual conversion rate is nowhere near 5%. The conversion rate of NYTimes' paywall is just 0.68% - more than seven times lower than expected!

Something doesn't add up. There is a huge discrepancy in the raw data. It is not that the 5% conversion rate is wrong. The question is just "5% of what?"

The raw data

I am currently running a series of tests looking into the raw data on this site. I am trying to understand my real audience. I want to know just how wrong the raw data is.

Note: I will write about each test is they are completed.

Here is the result of the first test: Real traffic vs. people blocking.

I looked at the most basic raw number that I had. The number of pageviews. Pageviews is, in itself, not that relevant (you should measure people or conversions), but most other numbers are based on the result of a page view request. We measure people by measuring unique cookies on page views.

I set up a simple test to find out how much was real, and how much was not. This was what I found.

Requests by people

The first test was simple. How many requests is made by real people, as opposed to automated systems, bots, social services and news aggregators checking for new content.

It turned out that 60% of all pageviews is generated by automated systems, and only 40% is made by people. So, if you have a publishing system that simply counts the number of times an article is requested, it is likely to be way off.

Note: The way I measured this was to check if the stylesheet was loaded or not (tip by @wa7son). I needed to measure this without using any kind of scripts or cookies (which could be blocked). It is based on the thinking that an automated system is likely to request the main page, but not the style sheet. This turned out to be 95% true. I also set up another test that recorded all user-agents, to filter out the 5% that got through.

How many block scripts, GA and ads?

Out of the 40% that was determined to be real request, I looked at how many of those were passed back to Google Analytics.

We all know that many use adblockers, but today people also use script blocker like the popular "noscript." The problem is that if people use noscript, Google Analytics will not be able to track them. And you will get hugely inaccurate data.

Testing this is simple. Just add a script to you page and see if it works. The result was that 27% use noscript or other script blocking tools. Which also means that my Google Analytics numbers are 27% off. That is a significant amount of traffic that I don't know what is doing.

The next step was to figure out how many people who blocks advertising. This is a bit tricky. But I found that the most reliable way was to test if the ad block had been modified, and thus contained an ad.

The result was that a staggering 44% is blocking ads.

What is Google Analytics and Adsense reporting?

What is interesting then is to compare the test numbers with the pageviews numbers being reported by Google Analytics and Google Adsense.

Google Analytics is reporting a pageview number that is 1.4 percentage points lower than what my test revealed. With the result that 28% of the real traffic is not showing up in Google Analytics.

Note: Other analytics tools will not fair any better. The problem is how they get the data.

For Google Adsense, the number of page views being reported is actually higher than the test number. There can be several explanations for this. It could be that some automated requests are actually triggering Adsense. It could be that some ad blockers are blocking the ad, but Adsense doesn't know it.

The final result is this:

Of those...

That is a lot of readers who come to this site, read the content, without giving anything in return.

If you are one of the people blocking ads, here the deal. You can either read most articles for free - in exchange of being exposed to ads. You don't have to click on them. You don't even have to look at the ads. But the ads do have to be fully visible on the screen.

If you don't want to see the ads (for varies reasons), you can subscribe to Baekdal Plus, which is ad free. On top of this, you get full access to all the plus articles (which are usually 7-12 pages long). And, you can download my new book about Social Commerce for free. It is only $5/month.

To me, that seems like a pretty good deal :)

Up next: How many people read an article?
I mean, really read it?

The next test (starting this evening) is to look at three things:

This is interesting for two reasons. First it will illustrate what the real impact is (I expect it to be much lower than what you might think), and it gives us another us way to look at bounce rates.

The problem with bounce rates is that it is measured by looking at how many pages you view - which is not always relevant. For publishers, a person reading only one page is worth a lot more than a person clicking around but never reading anything.

The test will run for two weeks, so stay tuned!


The Baekdal/Basic Newsletter is the best way to be notified about the latest media reports, but it also comes with extra insights.

Get the newsletter

Thomas Baekdal

Founder, media analyst, author, and publisher. Follow on Twitter

"Thomas Baekdal is one of Scandinavia's most sought-after experts in the digitization of media companies. He has made ​​himself known for his analysis of how digitization has changed the way we consume media."
Swedish business magazine, Resumé


—   thoughts   —


Why publishers who try to innovate always end up doing the same as always


A guide to using editorial analytics to define your newsroom


What do I mean when I talk about privacy and tracking?


Let's talk about Google's 'cookie-less' future and why it's bad


I'm not impressed by the Guardian's OpenAI GPT-3 article


Should media be tax exempt?