Skip to the main content.

4 min read

How to Clean Up Your Google Analytics Data from AI Bot Traffic

How to Clean Up Your Google Analytics Data from AI Bot Traffic
How to Clean Up Your Google Analytics Data from AI Bot Traffic
8:22

In the past couple of years, you may have noticed a strange trend in your analytics: traffic numbers are climbing, but conversion rates are dropping. Often, the cause isn’t a sudden change in customer behavior. It’s the rise of AI bots.

Crawlers from companies like OpenAI (GPTBot), Perplexity, Amazon, Apple, Anthropic, and others are scanning the web to feed their AI models. While they usually identify themselves, they also show up in your Google Analytics data as if they were real visitors. Since bots never buy, your traffic increases while your conversion rates decline.

This guide is not about blocking the bots outright (though some companies use firewalls or robots.txt for that). It’s about making sure your analytics reflect real customer behavior, so your year-over-year numbers remain meaningful.

How to Use This Guide

To keep instructions simple, each step begins by telling you where to click inside Google Tag Manager (GTM) or Google Analytics 4 (GA4).

For example:

“GTM → Variables → New” means:

  • Log into Google Tag Manager
  • In the left menu, click Variables
  • Click the New button

The same applies in GA4: “GA4 → Admin → Custom definitions” means you should log into Google Analytics 4, click the Admin gear icon, and then select Custom definitions under the Property column.

Step 1: Capture the User Agent in GTM

Every browser, human or bot, sends a user agent string that identifies what it is. If we capture that string, we can later filter out traffic that clearly comes from bots.

  • In GTM, go to Variables
  • Click New
  • Choose Variable Type: Custom JavaScript
  • Name it JS – User Agent
  • Paste this code:
function() { return navigator.userAgent; }

Save

Step 2: Send the User Agent into GA4

We now need to send this user agent information along with pageview events into GA4.

  • In GTM, go to Tags
  • Click New
  • For Tag Type, choose Google Analytics: GA4 Event
  • Enter your GA4 Measurement ID (looks like G-XXXXXXX)
  • For Event Name, type page_view

⚠️ Note: If your GA4 Config Tag is already sending page views automatically, you may get duplicates. To avoid this, either disable send_page_view in the config tag or give this event a slightly different name (like page_view_with_user_agent).

  • Scroll to User Properties, click Add Row
  • In the left column, type: user_agent (use an underscore, not a hyphen)
  • In the right column, click the variable icon and select JS – User Agent
  • For Triggering, select All Pages
  • Save and publish the tag

Now every page view your GA4 property receives will include the user agent string.

Step 3: Register the User Agent in GA4

GA4 won’t let you use custom user properties in reports until you register them as dimensions.

  • Log into GA4
  • Click the Admin (gear icon) at the bottom left
  • In the Property column, click Custom definitions
  • Click Create custom dimension
  • Fill in:

Dimension name: User Agent
Scope: User
User property: user_agent

Save

Step 4: Verify That It’s Working

Before you rely on this, confirm that the user agent is being passed correctly.

  • In GTM Preview/Tag Assistant, open your site, look at the fired GA4 Event Tag, and check that user_agent is listed with a value like Mozilla/5.0 … GPTBot.
  • In GA4 DebugView (Admin → DebugView), click a page_view event and check under User Properties for user_agent.

If DebugView isn’t showing your session, make sure your cookie banner is set to allow analytics and, for testing, you can add an event parameter debug_mode = true.

Step 5: Build a “Human Visitors” Audience in GA4

Now you can create an audience that excludes bots.

  • In GA4, go to Admin → Audiences → New Audience → Create a custom audience
  • Give it a clear name like Human Visitors (exclude bots)
  • In Include users when, click Add condition
  • Choose User property → user_agent → does not contain
  • Enter GPTBot
  • Click OR, add another condition: user_agent does not contain Perplexity
  • Repeat for each common AI bot string you want to filter:
  • Amazonbot
  • CCBot
  • Google-Extended
  • Anthropic
  • Applebot
  • Meta-ExternalAgent

⚠️ Note: GA4 only allows up to 10 conditions per audience. If you need to filter more bots in the future, create a second audience (e.g., Human Visitors – Extended).

Step 6: Use Your Clean Audience in Reports

From here forward, GA4 will classify new visitors into this audience (it won’t backfill past data).

You can now:

  • Apply this audience as a filter in standard reports
  • Compare “All Users” vs. “Human Visitors” in Explorations
  • Use this audience when exporting to BigQuery or when sharing data with Google Ads

This gives you a truer picture of real customer activity and keeps your KPIs comparable to pre-AI-bot baselines.

Why This Matters

For an eCommerce manager, accurate analytics is the foundation of decision-making. If bots double your traffic, your conversion rate can appear cut in half overnight — even though customer behavior hasn’t changed. That can lead to wrong conclusions about ad performance, site design, or merchandising.

By setting up this filter, you’re not fighting the bots; you’re simply keeping your data honest. Your conversion rates, revenue per session, and marketing ROI will once again reflect real shoppers — the people who actually matter to your business.

Staying Ahead

New crawlers will keep appearing. The advantage of this approach is flexibility: by capturing the full user_agent, you can always update your audience with new conditions as you spot new bots in your data.

This ongoing maintenance ensures your analytics stay clean and your KPIs remain consistent — even as the internet around you changes.

Frequently Asked Questions (FAQ)

Q: Will this method block bots from visiting my site?
No. This approach only filters bots out of your analytics reports so your data reflects human visitors. The bots may still crawl your site. If you want to block or rate-limit them, you’ll need to configure that at the server, CDN, or firewall level.

Q: Why not just use GA4’s built-in bot filtering?
GA4 includes standard bot filtering based on the IAB (Interactive Advertising Bureau) list. However, many newer AI crawlers (like GPTBot or Perplexity) aren’t covered in that list. Capturing the user_agent string gives you more flexibility and control.

Q: Could this setup cause duplicate page views in GA4?
Yes, it can if your GA4 Configuration Tag is also firing a page_view. To avoid double-counting, you can either disable automatic page views in the config tag or give your custom event a different name (e.g., page_view_with_user_agent).

Q: Do I need to list every possible bot user agent string?
No, only the ones that show up in your traffic. Start with the common ones (GPTBot, Perplexity, Amazonbot, CCBot, Google-Extended, Anthropic, Applebot, Meta-ExternalAgent). Over time, check your captured user agents in GA4 and add new exclusions as needed.

Q: How far back will this fix clean my data?
Audiences in GA4 are forward-looking only. They will not remove bot traffic from past data. From the day you publish the audience, new users will be classified accordingly.

Q: Can I filter by country instead of user agent?
If your store only sells to U.S. customers, you can also filter analytics data by country (e.g., only include “United States”). This removes much bot noise, but not all bots declare accurate geography. Combining country filters with user-agent filtering is often the best approach.

Q: What if my company uses Consent Mode or a cookie banner?
If you use Google Consent Mode, GTM and GA4 may not fire until the visitor grants analytics consent. When testing with Tag Assistant or DebugView, make sure to accept consent first.

Read more related blogs

An Introduction to Analytics Optimization

An Introduction to Analytics Optimization

Explore the evolving eCommerce analytics landscape in our comprehensive overview. Discover how GA4, AI, and advanced strategies—multi-channel attribution, event tracking, actionable audiences, real-time alerts, funnel analysis, and cookieless tracking—empower B2B marketers to optimize performance.

Read More
Analytics Optimization Series: The Value of First-Party Data

Analytics Optimization Series: The Value of First-Party Data

Prepare for a cookieless future with modern analytics. Discover a phased approach to leveraging first-party data and advanced tracking techniques for B2B eCommerce success.

Read More
Analytics Optimization Series: Site Segmentation Techniques for GA4

Analytics Optimization Series: Site Segmentation Techniques for GA4

Optimize your customer journey with advanced site segmentation and funnel analysis. Learn a phased, real-world strategy to pinpoint drop-offs, enhance conversions, and maximize data insights even with limited resources.

Read More