Skip to content
Google Analytics spam traffic vs real data

Boo-urns! I’m not as popular as I thought I was!

In my previous blog I covered how to how to identify fake referral traffic in Google Analytics data. Now it’s time to clean it up!

In order to clean up the data that had already been collected I created a segment view to see the data with the spam traffic filtered out. Setting up a filter will block spam data from affecting future data (which I will cover next time).

Creating a new Google Analytics Segment

To begin, click the + Add Segment towards the top of the Google Analytics screen. Give it a name and then open the Conditions tab in the left column. Two filters are required to remove both the Ghost Referrals and the Bot Crawler traffic.

Google Analytics new segment

The Google Analytics segment options. The circle on the right updates to show how much data your filters remove.

Remove Ghost Referrals

As I discovered earlier, ghost referrals (as well as fake direct and search traffic) all share a common element; they all have a Hostname value that is not that of my website. Therefore the most effective way to remove them is to only include sessions that have a Hostname value that contain PaulJardine.co.uk.

Google Analytics segment hostname filter

Only show hits that actually occurred on my website to filter out the ghost hits.

Note that if you use your tracking code on other sites (such as an external Paypal checkout if you have a shop) you will also need to list these sites as well (in regex format).

Applying this filter removed over 80% of the overall traffic, that’s just how much Ghost Referral traffic I was getting on this particular site!

Remove Bot Crawlers

Having applied the Hostname filter, next I wanted to remove the remaining bot crawlers. With the hostname filter applied to my new segment view, I returned back to the Referrals data in the Acquisition tab to review what was left.

Google Analytics bot crawler referrals

Google Analytics referral data with the Ghost Referrals removed but still showing the bot crawlers. Terms 6 & 7 are actual real websites which I’ve hidden.

I copy-and-pasted the spam results into a text document like so…

buttons-for-your-website.com
buttons-for-website.com
semalt.com

I then needed to convert this list into a regex format, placing a | (vertical line) symbol between each domain (there shouldn’t be a vertical line after the last one). The finished regex code looks like this…

buttons-for-your-website.com|buttons-for-website.com|semalt.com

I then created a second filter to exclude sessions where the source matches the regex of the list.

Google Analytics segment source filter

A second filter to remove any hits that come from the bot crawler sites

This filter may need to be updated from time to time if/when new bot crawlers start visiting the website.

Results

Applying the filtered segment view revealed that the spam referrals to this particular website accounted for approximately 70% of the overall data collected! Having cleaned it up and restored the accurate results, the Analytics stats became useful again and can once again be used to help inform design decisions on future updates to the website.

Google Analytics spam referral traffic vs real

The cleaned up data (blue) compared against the original (green) with a more accurate bounce rate and session duration

Next time, I’ll explain how to create a filter to stop future data from being collected in the first place!

I hope you found this helpful to clean up your Google Analytics stats! As ever you can get in touch with me on Facebook and Twitter.

Related Blogs

Setting Up a Filter to Block Spam Traffic in Google Analytics

12th August 2015

How to set up a filter in Google Analytics to prevent pesky spam traffic being recorded and messing up your visitor stats.

Read more

Identifying Spam Traffic in Google Analytics

27th July 2015

How to identify Ghost Referrals and Bot Crawlers in your Google Analytics stats so you can remove them and clean up your data.

Read more
View all blogs