Website Crawler Management: A Case Study

Web crawlers are essential to the internet ecosystem, powering everything from search engine indexing to web analytics. However, not all bots are beneficial. Malicious crawlers can wreak havoc on websites, consuming bandwidth, reducing performance, and even posing security threats. In this article, we explore how we, a leading managed WordPress hosting provider, tackled a client’s issue with aggressive and harmful bots, turning a potential crisis into an opportunity to enhance the site’s performance and security.

Introduction

Web crawlers, often called bots, are automated programs that scan websites for data. On the one hand, beneficial crawlers from search engines like Google help improve site visibility and SEO performance. On the other hand, malicious crawlers, such as those used by scrapers, hackers, and spammers, can degrade website performance, inflate server load, and expose vulnerabilities.

One of our clients, a merch store website, was experiencing significant issues with malicious bot traffic. The site’s performance was deteriorating, leading to slower load times, increased server costs, and a potential risk of security breaches. This situation required immediate and expert intervention.

The Challenge

We had noticed a sharp decline in the clients website performance, with pages taking longer to load. The issue was noticed at 12:48PM (GMT +3). Upon closer inspection, they observed unusually high levels of traffic from a variety of sources, most of which were not legitimate human visitors.

A more detailed graph with the above norms usage, starting to creep up slowly during a set time period (from 6AM GMT +3 to 12:48PM GMT +3), can be looked upon from the following image:

The impact of this malicious bot activity was severe:

Increased Server Load: The client’s server was overwhelmed by the constant requests from harmful crawlers, leading to slower site performance.
Decreased User Experience: The site’s visitors faced frustrating delays, which could drive them away and negatively impact the site’s reputation.
Potential Security Risks: Some of these crawlers were probing for vulnerabilities, putting the site at risk of hacking attempts and data breaches.

Initial Assessment

Our security and performance team quickly stepped in to assess the situation. Using their advanced monitoring tools, they identified that a significant portion of the traffic hitting the client’s site was coming from malicious bots masquerading as legitimate users or healthy crawlers.

One of the main challenges in dealing with crawler traffic is distinguishing between beneficial bots, like those from search engines, and harmful ones. Malicious bots often disguise themselves with fake user agents or rotate IP addresses to evade detection, making it difficult to block them without also affecting good bots.

Our Innovative Solution

Understanding the need for a nuanced approach, our team used a well established solution tailored to the client’s needs. This approach combined several methodologies to effectively filter out malicious bot traffic while ensuring that legitimate crawlers could still access the site.

Key components of the solution included:

Advanced Bot Detection: We employed effective methods that analyse bot behaviour patterns, user agents, and IP addresses to accurately differentiate between good and bad bots.
Dynamic IP Blacklisting: Rather than relying on static IP blacklists, our system dynamically identified and blocked IP addresses associated with malicious bots, preventing them from accessing the site in real-time.
Rate Limiting and Traffic Throttling: To further mitigate the impact of aggressive crawlers, we implemented rate limiting, which restricted the number of requests a bot could make within a certain timeframe, thereby reducing server load.
Bot Whitelisting: Legitimate bots, such as those from search engines, were placed on a whitelist, ensuring they could continue to access and index the site without interruption.

Implementation

We deployed this solution in a phased approach to minimise disruption to the client’s legitimate traffic. The steps included:

Initial Monitoring and Analysis: The team began by monitoring the site’s traffic patterns to establish a baseline and identify the most harmful bots.
Custom Rules Deployment: Based on the analysis, we implemented custom rules in the site’s firewall to block malicious traffic while allowing legitimate bots and human users to continue accessing the site.
Testing and Adjustment: We tested the solution in a controlled environment, making adjustments as needed to ensure that legitimate traffic was not affected.
Full Rollout: Once testing was complete, the solution was fully deployed across the client’s site, with continuous monitoring to ensure effectiveness.

Results and Benefits

The results of the intervention were immediate and impressive:

Improved Website Performance: With the malicious bots blocked, the client’s website load times decreased significantly, and possible server crashes became a thing of the past.
Reduced Server Load: The strain coming from the client’s website to the server was reduced, leading to more efficient use of resources.
Enhanced Security: By blocking crawlers that were probing for vulnerabilities, we significantly reduced the risk of potential security breaches.

You can notice how gradually the usage started to return to normal, after the phased approach:

The client was thrilled with the outcome, noting not only the performance improvements but also the peace of mind that came with knowing their site was better protected against malicious bot traffic.

Ongoing Management

We understand that the landscape of web security is always evolving. To maintain the effectiveness of the solution, we continue to monitor the client’s website for any new bot threats.

Our approach is designed to be proactive rather than reactive, ensuring that the client’s site remains secure and performs optimally, even as new threats arise.

Client Success

Best Practices

For other website owners looking to manage crawler traffic effectively, here are some tips:

Monitor Traffic Regularly: Keep an eye on your website’s traffic patterns to quickly identify any unusual spikes that might indicate malicious bot activity.
Use Firewalls and Bot Management Tools: Implement security tools that can help you detect and block harmful bots while allowing beneficial crawlers.
Whitelist Legitimate Bots: Ensure that search engines and other important bots can still access your site by creating and maintaining a whitelist.
Implement Rate Limiting: Prevent any single bot from overwhelming your server by limiting the number of requests it can make in a given time period.
Stay Updated on Bot Trends: The tactics used by malicious bots are constantly evolving, so stay informed about new threats and adjust your security measures accordingly.

By following these best practices, you can better manage bot traffic and protect your website from the negative impacts of malicious crawlers. Our experience in this case underscores the importance of a smart, proactive approach to web security in today’s digital landscape.

Separating Good Bots from Bad: Our Smart Solution for Website Crawler Management

Introduction

The Challenge

Initial Assessment

Our Innovative Solution

Implementation

Results and Benefits

Ongoing Management

Client Success

Best Practices

Evtim Todorov

Introduction

The Challenge

Initial Assessment

Our Innovative Solution

Implementation

Results and Benefits

Ongoing Management

Client Success

Best Practices

Evtim Todorov

Related Posts

WPX VS AZURE

WPX vs WP Engine

Celebrating Women in WordPress and Online Business