Skip to content

In my previous blog I covered how to how to identify fake referral traffic in Google Analytics data. Now it’s time to clean it up!

Google Analytics spam traffic vs real data

Boo-urns! I’m not as popular as I thought I was!

In order to clean up the data that had already been collected I created a segment view to see the data with the spam traffic filtered out. Setting up a filter will block spam data from affecting future data (which I will cover next time).

Creating a new Google Analytics Segment

To begin, click the + Add Segment towards the top of the Google Analytics screen. Give it a name and then open the Conditions tab in the left column. Two filters are required to remove both the Ghost Referrals and the Bot Crawler traffic.

Google Analytics new segment

The Google Analytics segment options. The circle on the right updates to show how much data your filters remove.

Remove Ghost Referrals

As I discovered earlier, ghost referrals (as well as fake direct and search traffic) all share a common element; they all have a Hostname value that is not that of my website. Therefore the most effective way to remove them is to only include sessions that have a Hostname value that contain

Google Analytics segment hostname filter

Only show hits that actually occurred on my website to filter out the ghost hits.

Note that if you use your tracking code on other sites (such as an external Paypal checkout if you have a shop) you will also need to list these sites as well (in regex format).

Applying this filter removed over 80% of the overall traffic, that’s just how much Ghost Referral traffic I was getting on this particular site!

Remove Bot Crawlers

Having applied the Hostname filter, next I wanted to remove the remaining bot crawlers. With the hostname filter applied to my new segment view, I returned back to the Referrals data in the Acquisition tab to review what was left.

Google Analytics bot crawler referrals

Google Analytics referral data with the Ghost Referrals removed but still showing the bot crawlers. Terms 6 & 7 are actual real websites which I’ve hidden.

I copy-and-pasted the spam results into a text document like so…

I then needed to convert this list into a regex format, placing a | (vertical line) symbol between each domain (there shouldn’t be a vertical line after the last one). The finished regex code looks like this…||

I then created a second filter to exclude sessions where the source matches the regex of the list.

Google Analytics segment source filter

A second filter to remove any hits that come from the bot crawler sites

This filter may need to be updated from time to time if/when new bot crawlers start visiting the website.


Applying the filtered segment view revealed that the spam referrals to this particular website accounted for approximately 70% of the overall data collected! Having cleaned it up and restored the accurate results, the Analytics stats became useful again and can once again be used to help inform design decisions on future updates to the website.

Google Analytics spam referral traffic vs real

The cleaned up data (blue) compared against the original (green) with a more accurate bounce rate and session duration

Next time, I’ll explain how to create a filter to stop spam data from being logged by Google Analytics in the first place!

I hope you found this helpful to clean up your Google Analytics stats! As ever you can get in touch with me on Twitter.

The PJWD Newsletter

Get helpful advice on web design, digital privacy and sustainability into your inbox once per month.


Helpful website advice in your inbox

The monthly PJWD newsletter is full of tips and advice on web design, digital privacy and sustainability (plus dog photos).

Get the PJWD Newsletter