Don’t ban the bots

I do a lot of DDoS related research online, which results in a lot of DDoS protection related spam/offers. A trend I have seen gaining popularity lately is “ban the bots”.

These emails contain a lot of emotionally charged language trying to persuade the reader that bots are destroying the internet, wasting your bandwidth and pillaging your website (and how for a modest monthly fee they can keep the digital invaders at bay). I couldn’t disagree more. For the most part I like bots. Bots save me a ton of work and allow me to the focus on tasks that are meaningful to me. The only reason that search engines, hotel booking sites, and social media sites operate so successfully (or at all) is because of bots.

These advertisements do acknowledge there are some good bots out there, while stressing the need to block the bad bots. I thought I’d pull some numbers from traffic running through our system. I was pleasantly surprised, as a DDoS protection service I was expecting to see more malicious bots than legitimate but what I found was 85% of the bot traffic is classified as good: SES (which stands for Search Engine Spiders, but is a general list of the known good bots) which we don’t want to block, and XSE which contains alternate Spiders and bots that while legitimate can cause impact on some websites.

Screen Shot 2017-07-27 at 15.38.10

The other 15% of traffic is from hosting companies, ISPs, and commercial traffic from unknown bots. This traffic is not automatically bad, but hidden somewhere in there are the malicious bots and scrapers which we do want to block. This is where the philosophy “ban the bots” makes things more complicated than it needs to be, because while it is a trivial matter to find and locate bots, it focuses you on the actor not the action. Don’t ban the bots, ban the malicious actions. If you design your web security to defend against malicious actions it shouldn’t matter whether they are from bots or not. At DOSarrest this is what we do, we create

special features to focus on the malicious bot traffic and apply them to customer configurations and leave the good bots alone.

In fact, I’ll go one step further: don’t ban the bots, help the bots. Because while I disagree with the conclusion the facts are not wrong, bots do consume more than a trivial amount of resources. By helping the bots find the content they are looking for you can reduce the impact on your site and possible improve your overall ranking.

Your first goal is getting the bots to your content in as few requests as possible, and at the same time stopping the bots from crawling pages you don’t need (or want) to show up in search results. Most modern sites have dynamic, pop-up, hidden menus that require multiple javascript and CSS resources to properly render. They might look fantastic, but a bot isn’t interested in the aesthetics of your site, they are looking for content. A sitemap is a great tool for linking all the content you want to emphasize without a bot having to navigate through a bunch of complicated dynamic resources. Then there are the rest of the pages in your site, things that are useful to your users but not things that need to appear in the search rankings, login pages, feedback forms, etc. Use robots.txt file or ‘noindex’ meta tags to direct the bots not to bother with these pages.

Your sitemap and robots.txt will help bots find the resources you want them to find, and avoid the ones you don’t. This will help lighten the load on your webserver, but won’t necessarily help your site ranking. The number one thing they are looking for is quality content. But searchbots also look for good performing sites. Too many errors or slow responses will negatively impact your ranking in a big way. The answer here is caching. Many bots, googlebot included, do full page downloads when indexing your site. They are looking for javascript and CSS files, images and PDFs, or whatever resources you’ve linked. Most of these resources are static and can be served up out of a CDN. Not only will this alleviate the load on your server, but the performance improvement will make all your quality content that much more appealing to the bots.

Sean Power

Security Solutions Architect

DOSarrest Internet Security

Source: https://www.dosarrest.com/ddos-blog/don-t-ban-the-bots/