Simple Guide to Filtering Spam in Google Analytics

To ensure all data collected by Google Analytics is valid, configure filters to prevent meddling bots from spamming your GA account. Follow these simple steps to remove three types of spam: ghost spam, crawler spam, and fake language spam. Whole Whale recommends troubleshooting all new filters in an account’s Test View first, then copy/pasting your perfected configuration into your Master View.

Ghost Spam

Ghost spam never actually crawls its target site. Instead, spam bots communicate directly with the Google Analytics Server to send fake data to unsuspecting accounts. Follow this video to create a ghost spam filter or see instructions below. 

To search for ghost spam in your Google Analytics account, choose “Audience” > “Technology” > “Network” > “Hostname” to view a list of all hostnames engaging with your site. This list should only include your domain, subdomains, and any services you’ve linked to your Google Analytics account. If you don’t recognize a hostname, it may be ghost spam.

To remove these invalid hostnames, build a custom filter that tells Google Analytics which hostnames belong in your account – think of this filter as a hostname guest list. To build your guest list, first identify all valid hostnames associated with your account.

Valid hostnames may include:

  • Your main domain
  • Any subdomains
  • Video services like YouTube
  • Payment Services like PayPal or your donation platform
  • Shopping Carts like Shopify
  • CDNs
  • Translation serves
  • Cache Services
  • IP addresses.

If you don’t recognize or control a hostname, odds are it should not be included on your list. Potential ghost spam hostnames include:

  • Spammy URLs
  • Links to sites you can’t control
  • And all hostnames under (not set)

Spam hostname examples

Not sure if a hostname is spam? Check for user engagement! If the bounce rate is near 99%, the hostname is likely ghost spam.

Writing the Guest List

Include all valid hostnames in a regular expression, separating each hostname with a pipe character | and adding a backslash \ before all periods and hyphens. Your regex has a limit of 255 characters, so combine hostnames when you can. Make sure to exclude any spaces and pipe characters at the beginning or end of your expression.

Example: wholewhale\.com|eventbrite\.com|wholewhale\.podbean\.com|youtube\.com

You can double-check your regex by creating a Google Analytics Segment, setting “Source Matching” equal to your expression. Apply this segment to your reports to see how the configuration affects your data.

Now create your filter!

  1. Select “Admin” and toggle to the appropriate view.
  2. Choose the “Filters” tab and click “Add Filter” to configure.
  3. Name your filter, select “Custom” for Filter Type, and specify that you want to “Include” “Hostname.”
  4. Enter your expression, and use the “Verify Button” to triple-check your new filter.
  5. Click “Save” to launch your ghost spam filter! 

Screenshot of ghost spam filter configuration

Note: your regex will only need to be updated when you add your Google Analytics tracking ID to a new service or domain.

Crawler Spam

Crawler spam crawls target sites, leaving fake data in its wake. Luckily, filtering crawler spam is simple: copy the following expressions into custom filters to exclude crawler traffic from your account.

  1. Navigate to Admin, Choose Filters, then click “Add Filter.”
  2. Name your filter, then choose “Custom” for Filter Type, and select “exclude.”
  3. Set field equal to “campaign source,” then paste the following expression into the box.
  4. Verify the filter, then choose “Save.”
  5. Repeat for Expression #2.

Screenshot of crawler spam filter configuration

Expression #1:

(best|dollar|success|top1)\-seo|anticrawler|^scripted\.|semalt|forum69|7makemon|sharebutton|ranksonic|sitevaluation|dailyrank|vitaly|profit\.xyz|rankings\-|dbutton|uptime(bot|check|\.com)

Expression #2:

Datract|hacĸer|ɢoogl|responsive\-test|dogsrun|tkpass|free\-video|keywords\-monitoring|pr\-cy\.ru|fix\-website|checkpagerank|seo\-2\-0\.|platezhka|timer4web|share\-buttons|99seo|3\-letter|top10\-way

Fake Language Spam

Fake Language spam, a special type of ghost spam, will likely be removed by our ghost and crawler spam filters. Still, we recommend adding this final custom filter to be on the safe side.

Follow the same steps for creating the crawler spam filter above, but choose “Language Settings” instead of “Campaign Source” as the Filter Field. Paste the following into the Filter Pattern: \s[^\s]*\s|.{15,}|\.|,

Screenshot of language spam filter configuration

Click to verify – fake languages should appear on the left hand side of the table. These may look like this:

“o-o-8-o-o.com search shell is much better than google!” and “☆ ¯\_(ツ)_/¯(ಠ益ಠ)(ಥ‿ಥ)(ʘ‿ʘ)ლ(ಠ_ಠლ)( ͡° ͜ʖ ͡°)ヽ(゚Д゚)ノʕ•̫͡•ʔᶘ ᵒᴥᵒᶅ(=^ ^=)oO””

After applying filters, monitor your traffic closely over the next several days to ensure that you are not blocking out valid sources. Note: filters may take 24 hours to apply!