Table of Contents
In the past couple of years, you may have noticed a strange trend in your analytics: traffic numbers are climbing, but conversion rates are dropping. Often, the cause isn’t a sudden change in customer behavior. It’s the rise of AI bots.
Crawlers from companies like OpenAI (GPTBot), Perplexity, Amazon, Apple, Anthropic, and others are scanning the web to feed their AI models. While they usually identify themselves, they also show up in your Google Analytics data as if they were real visitors. Since bots never buy, your traffic increases while your conversion rates decline.
This guide is not about blocking the bots outright (though some companies use firewalls or robots.txt for that). It’s about making sure your analytics reflect real customer behavior, so your year-over-year numbers remain meaningful.
To keep instructions simple, each step begins by telling you where to click inside Google Tag Manager (GTM) or Google Analytics 4 (GA4).
For example:
“GTM → Variables → New” means:
The same applies in GA4: “GA4 → Admin → Custom definitions” means you should log into Google Analytics 4, click the Admin gear icon, and then select Custom definitions under the Property column.
Every browser, human or bot, sends a user agent string that identifies what it is. If we capture that string, we can later filter out traffic that clearly comes from bots.
function() { return navigator.userAgent; }
Save
We now need to send this user agent information along with pageview events into GA4.
⚠️ Note: If your GA4 Config Tag is already sending page views automatically, you may get duplicates. To avoid this, either disable send_page_view in the config tag or give this event a slightly different name (like page_view_with_user_agent).
Now every page view your GA4 property receives will include the user agent string.
GA4 won’t let you use custom user properties in reports until you register them as dimensions.
Dimension name: User Agent
Scope: User
User property: user_agent
Save
Before you rely on this, confirm that the user agent is being passed correctly.
If DebugView isn’t showing your session, make sure your cookie banner is set to allow analytics and, for testing, you can add an event parameter debug_mode = true
.
Now you can create an audience that excludes bots.
⚠️ Note: GA4 only allows up to 10 conditions per audience. If you need to filter more bots in the future, create a second audience (e.g., Human Visitors – Extended).
From here forward, GA4 will classify new visitors into this audience (it won’t backfill past data).
You can now:
This gives you a truer picture of real customer activity and keeps your KPIs comparable to pre-AI-bot baselines.
For an eCommerce manager, accurate analytics is the foundation of decision-making. If bots double your traffic, your conversion rate can appear cut in half overnight — even though customer behavior hasn’t changed. That can lead to wrong conclusions about ad performance, site design, or merchandising.
By setting up this filter, you’re not fighting the bots; you’re simply keeping your data honest. Your conversion rates, revenue per session, and marketing ROI will once again reflect real shoppers — the people who actually matter to your business.
New crawlers will keep appearing. The advantage of this approach is flexibility: by capturing the full user_agent, you can always update your audience with new conditions as you spot new bots in your data.
This ongoing maintenance ensures your analytics stay clean and your KPIs remain consistent — even as the internet around you changes.
Q: Will this method block bots from visiting my site?
No. This approach only filters bots out of your analytics reports so your data reflects human visitors. The bots may still crawl your site. If you want to block or rate-limit them, you’ll need to configure that at the server, CDN, or firewall level.
Q: Why not just use GA4’s built-in bot filtering?
GA4 includes standard bot filtering based on the IAB (Interactive Advertising Bureau) list. However, many newer AI crawlers (like GPTBot or Perplexity) aren’t covered in that list. Capturing the user_agent string gives you more flexibility and control.
Q: Could this setup cause duplicate page views in GA4?
Yes, it can if your GA4 Configuration Tag is also firing a page_view. To avoid double-counting, you can either disable automatic page views in the config tag or give your custom event a different name (e.g., page_view_with_user_agent).
Q: Do I need to list every possible bot user agent string?
No, only the ones that show up in your traffic. Start with the common ones (GPTBot, Perplexity, Amazonbot, CCBot, Google-Extended, Anthropic, Applebot, Meta-ExternalAgent). Over time, check your captured user agents in GA4 and add new exclusions as needed.
Q: How far back will this fix clean my data?
Audiences in GA4 are forward-looking only. They will not remove bot traffic from past data. From the day you publish the audience, new users will be classified accordingly.
Q: Can I filter by country instead of user agent?
If your store only sells to U.S. customers, you can also filter analytics data by country (e.g., only include “United States”). This removes much bot noise, but not all bots declare accurate geography. Combining country filters with user-agent filtering is often the best approach.
Q: What if my company uses Consent Mode or a cookie banner?
If you use Google Consent Mode, GTM and GA4 may not fire until the visitor grants analytics consent. When testing with Tag Assistant or DebugView, make sure to accept consent first.