A new form of spam has arrived, and it is in your Google Analytics data, throwing off your numbers. You may have seen a rise in your traffic recently, but if you check where this traffic is coming from, you may find a bunch of hits from spambots. These visits are from spam sites like “semalt.com,” “buttons-for-website.com,” “free-share-buttons.com,” and more.
In this article, I explain why this is happening and provide solutions to stop referral spam in your Google Analytics data.
What I will Cover (click to jump to topic)
- Why are spammers using referral spam?
- How do spambots get into your Google Analytics referral data?
- Why should you be worried?
- Methods to stop referral spam.
Why are spammers using referral spam?
With so many methods to spam, the question often is why spammers would take the time to spam your referral traffic. The answer is more simple than you might think. You will notice that most of the spammers using referral spam are advertising services for webmasters. Their hope is that you will see them as one of your top referral sources, visit their website out of curiosity, and use their product.
Another reason spammers will resort to referral spam is to build links back to their website. This is because some websites publish their analytics to the public. Your visit will create linkbacks to these sites posting their URLs.
Obviously, the spammers can make money from these tactics. Between the advertising, affiliate and retargeting revenue they earn, it makes it worth their while to set up a spamming system.
How do spambots get into my Google Analytics referral data?
There are two ways that spambots work their way into Google Analytics. The first way is completely non-targeted and random. Google Analytics uses a string of numbers for your account number. Spambots will randomly load your code into their script to fire off a pageview on their server, never actually visiting your website. Often, they will not even know what your URL is. All they need is your Google Analytics account number, which can be randomly generated or pulled from your website.
The second version is more invasive and directly targets your website. Just like Google has spiders/bots that crawl your website, spambots will visit your website as well, generating a page view directly on your website, which gets logged in your analytics.
In both cases, these scripts or bots will fake HTTP headers and set their referral to whatever link they are trying to promote. When Google Analytics logs this data, their link is now posted.
To understand referral spam better, please watch the video below from Matt Cutts, Head of Google’s Webspam team.
Why should you be worried?
This type of spam does not normally put a huge load on your server and should not cause any performance issues. Many webmasters will ignore the spam, not giving much thought or care towards stopping it.
One of the biggest reasons we combat this spam is due to the misleading data being loaded into Google Analytics. Your data is very powerful and can tell you a lot about your visitors and marketing efforts. Referral spam will provide you with data that could affect your marketing decisions. You may think that your traffic is much higher than it is. I have seen this first hand with a client where over 80% of their traffic was referral spam. Obviously, any decision I had made would have been flawed, inaccurate and useless.
Methods to stop referral spam
Unfortunately, it is nearly impossible for you to stop referral spam completely. This is something Google needs to do at a global level but has failed to do so. There is some good news. Below, I have two methods that should stop a decent amount of your referral spam. Since new referral spam URLs are created daily, you will need to continue to update your filters.
Unfortunately, it is nearly impossible to stop referral spam without #Google help Share on XWe will be creating some filters to block these spammy bots. Because we may accidently filter real users by accident, my recommendation is to first create a new view. Doing this will allow you to have a completely unfiltered view and a filtered view.
Create a new view
Go to Admin -> View then select Create new view from the drop down. Call this new view Filtered Website Traffic or something you will be able to identify later.
Method 1: The Hostname Filter
Some of the referral spam will not come in through your website and will show a hostname other than yours. This hostname should always be your website or any other website you may have your analytics code on, e.g.,Shopify or PayPal.
First, check the current referral hostnames to see which hostnames you are currently receiving traffic from. Go to Acquisition -> All Traffic -> Referrals. Under “Secondary Dimension” search for and choose “Hostname”.
You will see a list of sources and the hostname. All the ones that are either (not set) or a domain other than your own or a site you have your code on should all be spam.
To create the filter, go to Admin. Under Account, select “All Filters” then click the “New Filter” button. Next, select Custom then select Include. For the filter field, search for and then select “Hostname.”
The filter pattern requires some REGEX (Regular Expressions). If you are using only one domain, use “domain/.com|translate/.googleusercontent/.com” without the quotes. If you are using two or more domains, use “|” between them, no spaces, for an OR statement. For example, it would be “domain1\.com|domain2\.com|translate/.googleusercontent/.com” for two domains. Note that I am putting a “\” before any periods. This escapes the period since in REGEX period means something different. Also, be sure to add in support for Google Translate. Since Google translates your website on their page, the hostname will be translate.googleusercontent.com, which is valid traffic.
Method 2: The Source Filter
You may have noticed some of these spammers are either going directly to your page or using your domain as their hostname. This means the method above will not work. It is time to get aggressive and go after these spammers directly.
For this method, you will need to create three different filters since each filter limit is 255 characters. As with the method above, create a new filter from the Admin panel. This will also be a custom filter, but use exclude this time. For the filter field, select “Campaign Source”.
The filter pattern is a little complex, however, Ben Travis from Viget did an amazing job of creating two of the best REGEX that will block most current referral spam.
Filter One:
.*((darodar|priceg|semalt|buttons\-for(\-your)?\-website|makemoneyonline|blackhatworth|hulfingtonpost|o\-o\-6\-o\-o|(social|(simple|free)\-share)\-buttons)\.com|econom\.co|ilovevitaly(\.co(m)?)|(ilovevitaly(\.ru))|(humanorightswatch|guardlink)\.org).*
Filter Two:
.*((best(websitesawards|\-seo\-(solution|offer))|Get\-Free\-Traffic\-Now|googlsucks|theguardlan|webmaster\-traffic)\.com|(domination|torture)\.ml|((rapidgator\-)?(general)?porn(hub(\-)?forum)?|4webmasters)\.(ga|tk|org|uni)|(buy\-cheap\-online)\.info).*
Filter Three:
.*event\-tracking\.com.*
If you see others that are making their way in, you can simply add them on your own. Simply add .*domaintoblock\.com.* to your filter. Remember to use “|” as OR and use “\” before any periods or dashes.
Conclusion
Spammers are always finding new ways to exploit tools and services in an attempt to make a few extra bucks. Google Analytics referral spam is one of the more creative options for spammers. My hope is that Google will recognize that this type of spam is causing many websites to have inaccurate data and work on a solution at the global level.
Spammers are always finding new ways to exploit tools and services to make a few extra bucks. Share on XThe above methods will help you combat some of the current offenders. You will want to check your analytics each month and update the filters to include any new spammers that start to pop up. If you find some new ones, please mention them in the comments below to allow us the ability to update this article. If you’re having a difficult time getting the above filters to work, let me know below as well, and I will do my best to help.
Tessa Henley says
Hey Chris! This is amazingly helpful. Roughly 31% of the site I manage’s referrals were coming from spambots. Thank you!
Chris Edwards says
Tessa, great to hear! I am glad this was able to help you decrease that spam and see your real data.
Joe says
I have read on other blogs that google can send legitimate content your way and if you don’t include googleusercontent in the include filter that you can loose data? Is this true? I have tried running this filter on a couple of views and I see gross discrepancies in traffic. I’m hesitant to roll it out on other active views. I also noticed in your example you have a pipe | after shoppify, isn’t that adding an extra “or” statement and invalidating the entire expression?
Chris Edwards says
Apologies Joe. You are correct, translate.googleusercontent.com is a hostname that should be included. This hostname appears when someone uses Google Translate on your website. I have updated the article as such.
As for seeing gross discrepancies, I would analyze you current traffic. I recently had a client in which 70% of their traffic was spam referral traffic. Depending on how much traffic you normally get to your site, you may see a large dip in traffic being reported. Go to Acquisition -> All Traffic -> Referrals in your unfiltered view then open a new tab and goto the same section with your filtered view. Take a look if any legit traffic is being filtered out or if your site is just being bombarded by these spam bots. Please comment back with an update once you have done this, as I would like to update my filter if it is incorrect.
As for the pipe after Shopify, that was actually the cursor jumping into my screenshot =). I have photoshopped out the cursor to make it less confusing. You were right, it did look a lot like an additional pipe which would affect the REGEX statement.
Kirsten says
I have had a valid hostname filter going for several months and it has worked like a charm and I’ve had Zero referral spam incidents since then- until this week. I now have “satellite.maps.ilovevitaly.com” as a referral, showing the hostname “translate.googleusercontent.com”
I understand that the translate.googleusercontent.com host is a valid one and indicates someone coming in through the translate function of my page. So, now what? How can I exclude this and am I going to have to start building multiple filters each week? It was working so well…