A Basic Guide to Stopping Bot Traffic on Your Website
Although Google Analytics does a good job at giving an overview of what happens on your website, they only provide partial information about your website’s traffic. You may be surprised to learn that more than half of your server hits are derived from bots or scrapers. Sadly, Google Analytics, as well as most analytics services, do not report these hits.
It’s understandable that they are focused on human visitors because this is what provides you with the most value. Still, whether a CPU or a human is visiting your site, it’s going to take processing capacity away from your server. It would be a lot better if you could use your processing capacity solely on real user requests.
What is bot traffic and what are those bots doing?
Just think about it like this. Imagine you have an e-commerce site and at 5:30 p.m., you have 10 shoppers who want to check out from their shopping cart. This is real money and valuable traffic. If your CPU is dealing with hundreds of hits from bots simultaneously, your servers’ capacity for processing is being sucked up.
Think about what that’s going to do to the experience for human visitors. They’re going to have a less than stellar experience. This, in turn, may cause some to bounce away and never come back. You lost revenue and customers as a result of those pesky bots!
Why don’t internet bot traffic rates show up on web analytics? The answer is JavaScript code. In order for Google Analytics to do its job, it needs a JavaScript code snippet from your website to track the visitors to your site. This information is collected and then it sent back to Google Analytics, which compiles the information and produces reports.
This only works if the JavaScript snippet is executed by the client’s browser. So when humans visit your site, the site is going to automatically run JavaScript in order for users to see your images, execute CSS, and do all the things needed so the website looks and acts as it should.
Bots don’t do this. They crawl around your site and don’t need to run an actual browser. They don’t care about images, so they’re not using JavaScript. They’re only interested in grabbing information from your site’s raw HTML documents. No browser = no JavaScript = no Google Analytics. All the bot traffic literally flies under the radar.
Three ways to stop bot traffic
First, you need to understand that blocking all bots is not a viable option. Sites like Google and Bing use bots to improve your SEO value. Most site owners understand it’s not a good idea to try to block all of these bots. At the same time, they need processing capacity for the humans who are visiting your site. The sweet spot is in between an open door policy and blocking all bots.
Second, look at IP blocking as a way to block individual bots that are no good. These are the bots that are scraping your site and illegally republishing your content to other places, scamming your comments, or showing your visitors advertisements that you don’t want. Identify the bots that are causing you trouble by looking at your firewall provider or your host provider. You can look at the raw server log on your website, but usually, these logs are huge, and so you’ll need a tool like Deep Log Analyzer to read it. Once you’ve found the offending IP addresses, use your firewall service to block them.
Third, use employee throttling to improve your site’s performance. Even if a bot is not malicious in nature, if it is affecting the performance of your human visitors, it’s doing more harm than good. There are no bots worth having if your users are having a bad experience. Throttling is where you limit how many times a client can hit your website. First, set the max number of hits you will allow for a set time period. Then, any client who reaches the preset internet bot traffic rates is automatically blocked. You’ll need to use a little finesse.
The idea is finding that sweet spot. You want a limit that’s high enough so that friendly bots and actual clients can visit your site and roam your page. However, you want it to be low enough that any bot that is continually hitting your site gets blocked before they negatively impact the performance of your human visitors.
You can do this by first looking at the maximum amount of hits your server gets on the busiest day. Then find the number of visitors you have on an average day. Divide the first number and the second number and record this result.
Take the result and divide it by 1440 to get a per minute figure. Multiply that by 10, and this would be the threshold limit on a permanent basis. This way, you can guarantee that you get all the normal activity you need on your site without affecting your site’s performance.
How reducing bot traffic can help
Your customers demand a seamless experience when using your site. If user performance diminishes because of bot traffic, you can kiss some of your customers goodbye. Reducing bot traffic not only prevents your site from being spammed with unwanted comments, or advertisements, but guarantees that your visitors have an experience that will keep them coming back for more.