How To Block Referral Spam With Google Analytics Filters or Htaccess

Any business who is trying to increase high-value traffic or conversions needs to be hyper-focused on their analytics. This includes reporting on metrics such as how traffic is coming to your website, where it’s coming from (referral traffic), and which traffic is converting best.

The challenge is that there is an analytics enemy that can reek havoc on your data by messing with the accuracy of the numbers, specifically the referral traffic. What are we talking about? Referral or Referrer Spam!

Fortunately, Google is aware of this and has created a way for a business to filter referral spam from Google Analytics reports so that it doesn’t interfere with strategies that are defined by insights gained from the data.

How do you block referral spam? Below you will find a guide that will help you set up filters in Google Analytics to block these spam sources and get more accurate data for your B2B or B2C Company.

Referrer vs Referral Spam

First, for those of you who are less versed in analytics, let’s start with the basics of what a referrer is.

A referrer is the website address that is passed along when a user goes from one page to another. So if you have a link on a website through guest posting, or if your business was referenced in an article, and someone clicked the link, it will show up in your Google Analytics as a referral source.

These referral sources can be used to help track link building activities, or provide insights into how valuable it is to be on a website (is the site sending you a ton of traffic, or not much at all).

The challenge is that what can be measured can be messed with by spammers, and thus referral spam is born.

So what is referral spam?

There are two primary types of referral spam; Ghost and Crawler Referral Spam.

Ghost Referral Spam happens when this referrer is changed by a spammer (or unethical marketer). They will fake a referrer and sometimes give it a name of a page they want to promote. The spammer then makes repeated requests to your website with the intention of showing up in the Google Analytics report of your website.

Crawler Referrer Spam browses sites, much like Google, but unlike Google bot whose intent is to gather information and index your website in their search engine, the goal of Crawler Spam in most cases is the same as Ghost Referral Spam.

Note: The crawlers used for Crawler Referral Spam usually ignore rules that are placed in a robots.txt.

So why do spammers create referral spam?

The primary reason is to get what might be labeled “curiosity traffic”. Business and website owners want to know what websites are sending them traffic, so they go to the referral data to check it out. If a website owner sees a website that they don’t recognize they usually visit the website, and this is what the spammer is hoping they do.

One or two visits from a website owner might not seem significant, but most referral spammers will do this on a huge scale, impacting thousands of Google Analytics accounts.

How to Block Referral Spam With .htaccess

A commonly suggested solution is to use the .htaccess file to block both types of referral spam, but using the .htaccess is only successful for blocking Crawler Referral Spam.

If you are getting crawler referral spam from common sources such as Semalt.com, you can place the following code within your .htaccess file to block that website and other malicious crawlers from gaining access to your website. The code below includes the popular Crawler Referral Spam.


## REFERRER BANNING
RewriteCond %{HTTP_REFERER} semalt.com [NC, OR]
RewriteCond %{HTTP_REFERER} buttons-for-website.com [NC, OR]
RewriteCond %{HTTP_REFERER} buttons-for-your-website.com [NC]
RewriteRule .* - [F]

 

To add additional sources make sure they are added above the last listing in the list, and use the [NC, OR] ending on the line.

Using Filters in Google Analytics To Block Referral Spam

This is by far the most efficient and effective strategy and works for both Ghost and Crawl Referral Spam. The following method will stop all ghost spam whatever it shows as a referral, keyword, or direct visit.

Option 1: Using Hostname Filter

Phase 1: How to Get a List of the Referral Spam

1. Go to the Reporting tab in Google Analytics, located in the top navigation.

2. Select a time frame to get data for. Make sure to select a wide enough time frame to capture as many of these referral sources as possible, but not too large to make the data set unmanageable – 3 months should be good enough.

3. In the left-hand navigation select Audience, then expand the technology section and select the network option.

network

5. Just under the main graph, and above the data set, select the blue link named Hostname.

hostname

6. Once you click Hostname you will see a list of the Hostnames. You may have to increase the list limit to see them all if there are more than 10.

7. Gather all the Invalid Hostnames from this dataset

Invalid hostnames are websites you don’t recognize or haven’t put your Google Analytics code on – even known names like google.com or amazon.com (spammers use this names to mislead people) are invalid Hostnames.

8.  Once you gather all your Invalid Hostnames, you’ll need to create a Regular Expression (REGEX) filter in Google Analytics that matches the invalid Hostnames – you’re only allowed 255 characters in the expression, so make sure you get the best ones in there.

REGEX Details:

  • Don’t leave any spaces.
  • The | (pipe) separates Hostnames
  • The backslashes \  are used to escape the dot in regular expressions (this is used if you want to exclude IP addresses).

There are many options for creating these Expressions, below you will find Google’s guide.

Reg-expressions-google

9. Once you have the REGEX built, you should add it to an EXCLUDE Hostname filter.

Phase 2: Setting up the Exclude Hostname Filter in Google Analytics

1. Go to the Admin tab admin

2. Select the account you want to apply the filter to and select All Filters

account

3. Select New Filter

new filter

4.  Enter a name for your filter

filter

5. In Filter Type, select Custom

6. Choose Exclude

7.  In the Filter Field drop-down menu select Hostname

8. Paste the REGEX that you build with your invalid Hostnames in Filter Pattern

9. Select the available views you want the filter applied to, and click add (they should now be in the right-hand box.

views

10. Finally, save your filter.

Option 2: Campaign Source

We find that this option works better for blocking referral spam.

1. Follow steps 1-6 from Option 1 – Hostname Filter.

2. Instead of choosing “Host Name” from the filter field, choose “Campaign Source”.

campaign-source

3. Follow steps 8-10 from Option 1 – Hostname Filter.

Don’t fret if you don’t see this filter working instantly, it can take up to 24 hours for the filter to start working.

Need help? Contact our CEO directly.

Don’t like filling out forms? Below is our CEO’s direct line, feel free to call him to talk about your project.