{"id":2021661,"date":"2025-01-06T09:13:45","date_gmt":"2025-01-06T09:13:45","guid":{"rendered":"https:\/\/wpx.net\/blog\/?p=2021661"},"modified":"2025-01-06T09:13:47","modified_gmt":"2025-01-06T09:13:47","slug":"website-crawler-management-case-study","status":"publish","type":"post","link":"https:\/\/wpx.net\/blog\/website-crawler-management-case-study\/","title":{"rendered":"Separating Good Bots from Bad: Our Smart Solution for Website Crawler Management"},"content":{"rendered":"\n<p><a href=\"https:\/\/www.semrush.com\/blog\/website-crawler\/\" data-type=\"link\" data-id=\"https:\/\/www.semrush.com\/blog\/website-crawler\/\" target=\"_blank\" rel=\"noopener\">Web crawlers<\/a> are essential to the internet ecosystem, powering everything from search engine indexing to web analytics. However, not all bots are beneficial. Malicious crawlers can wreak havoc on websites, consuming bandwidth, reducing performance, and even posing security threats. In this article, we explore how we, a leading managed WordPress hosting provider, tackled a client&#8217;s issue with aggressive and harmful bots, turning a potential crisis into an opportunity to enhance the site&#8217;s performance and security.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Introduction\"><\/span><strong>Introduction<\/strong><strong><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Web crawlers, often called bots, are automated programs that scan websites for data. On the one hand, beneficial crawlers from search engines like Google help improve site visibility and SEO performance. On the other hand, malicious crawlers, such as those used by scrapers, hackers, and spammers, can degrade website performance, inflate server load, and expose vulnerabilities.<\/p>\n\n\n\n<p>One of our clients, a merch store website, was experiencing significant issues with malicious bot traffic. The site&#8217;s performance was deteriorating, leading to slower load times, increased server costs, and a potential risk of security breaches. This situation required immediate and expert intervention.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"The_Challenge\"><\/span><strong>The Challenge<\/strong><strong><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>We had noticed a sharp decline in the clients website performance, with pages taking longer to load.&nbsp; The issue was noticed at 12:48PM (GMT +3). Upon closer inspection, they observed unusually high levels of traffic from a variety of sources, most of which were not legitimate human visitors.<\/p>\n\n\n\n<p>A more detailed graph with the above norms usage, starting to creep up slowly during a set time period (from 6AM GMT +3 to 12:48PM GMT +3), can be looked upon from the following image:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"538\" src=\"https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/graph-1-1024x538.png\" alt=\"malicious bot activity\" class=\"wp-image-2021665\" srcset=\"https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/graph-1-1024x538.png 1024w, https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/graph-1-300x158.png 300w, https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/graph-1-768x403.png 768w, https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/graph-1-1536x806.png 1536w, https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/graph-1-2048x1075.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>The impact of this malicious bot activity was severe:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Increased Server Load<\/strong>: The client&#8217;s server was overwhelmed by the constant requests from harmful crawlers, leading to slower site performance.<\/li>\n\n\n\n<li><strong>Decreased User Experience<\/strong>: The site\u2019s visitors faced frustrating delays, which could drive them away and negatively impact the site\u2019s reputation.<\/li>\n\n\n\n<li><strong>Potential Security Risks<\/strong>: Some of these crawlers were probing for vulnerabilities, putting the site at risk of hacking attempts and data breaches.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Initial_Assessment\"><\/span><strong>Initial Assessment<\/strong><strong><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Our security and performance team quickly stepped in to assess the situation. Using their advanced monitoring tools, they identified that a significant portion of the traffic hitting the client&#8217;s site was coming from malicious bots masquerading as legitimate users or healthy crawlers.<\/p>\n\n\n\n<p>One of the main challenges in dealing with crawler traffic is distinguishing between beneficial bots, like those from search engines, and harmful ones. Malicious bots often disguise themselves with fake user agents or rotate IP addresses to evade detection, making it difficult to block them without also affecting good bots.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Our_Innovative_Solution\"><\/span><strong>Our Innovative Solution<\/strong><strong><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Understanding the need for a nuanced approach, our team used a well established solution tailored to the client&#8217;s needs. This approach combined several methodologies to effectively filter out malicious bot traffic while ensuring that legitimate crawlers could still access the site.<\/p>\n\n\n\n<p>Key components of the solution included:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Advanced Bot Detection<\/strong>: We employed effective methods that analyse bot behaviour patterns, user agents, and IP addresses to accurately differentiate between good and bad bots.<\/li>\n\n\n\n<li><strong>Dynamic IP Blacklisting<\/strong>: Rather than relying on static IP blacklists, our system dynamically identified and blocked IP addresses associated with malicious bots, preventing them from accessing the site in real-time.<\/li>\n\n\n\n<li><strong>Rate Limiting and Traffic Throttling<\/strong>: To further mitigate the impact of aggressive crawlers, we implemented rate limiting, which restricted the number of requests a bot could make within a certain timeframe, thereby reducing server load.<\/li>\n\n\n\n<li><strong>Bot Whitelisting<\/strong>: Legitimate bots, such as those from search engines, were placed on a whitelist, ensuring they could continue to access and index the site without interruption.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Implementation\"><\/span><strong>Implementation<\/strong><strong><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>We deployed this solution in a phased approach to minimise disruption to the client\u2019s legitimate traffic. The steps included:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Initial Monitoring and Analysis<\/strong>: The team began by monitoring the site\u2019s traffic patterns to establish a baseline and identify the most harmful bots.<\/li>\n\n\n\n<li><strong>Custom Rules Deployment<\/strong>: Based on the analysis, we implemented custom rules in the site\u2019s firewall to block malicious traffic while allowing legitimate bots and human users to continue accessing the site.<\/li>\n\n\n\n<li><strong>Testing and Adjustment<\/strong>: We tested the solution in a controlled environment, making adjustments as needed to ensure that legitimate traffic was not affected.<\/li>\n\n\n\n<li><strong>Full Rollout<\/strong>: Once testing was complete, the solution was fully deployed across the client\u2019s site, with continuous monitoring to ensure effectiveness.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Results_and_Benefits\"><\/span><strong>Results and Benefits<\/strong><strong><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The results of the intervention were immediate and impressive:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Improved Website Performance<\/strong>: With the malicious bots blocked, the client\u2019s website load times decreased significantly, and possible server crashes became a thing of the past.<\/li>\n\n\n\n<li><strong>Reduced Server Load<\/strong>: The strain coming from the client\u2019s website to the server was reduced, leading to more efficient use of resources.<\/li>\n\n\n\n<li><strong>Enhanced Security<\/strong>: By blocking crawlers that were probing for vulnerabilities, we significantly reduced the risk of potential security breaches.<\/li>\n<\/ul>\n\n\n\n<p>You can notice how gradually the usage started to return to normal, after the phased approach:<\/p>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-2 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2571\" height=\"1395\" data-id=\"2021664\" src=\"https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/graph-2.png\" alt=\"performance improvements\" class=\"wp-image-2021664\" srcset=\"https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/graph-2.png 2571w, https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/graph-2-300x163.png 300w, https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/graph-2-1024x556.png 1024w, https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/graph-2-768x417.png 768w, https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/graph-2-1536x833.png 1536w, https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/graph-2-2048x1111.png 2048w\" sizes=\"auto, (max-width: 2571px) 100vw, 2571px\" \/><\/figure>\n<\/figure>\n\n\n\n<p>The client was thrilled with the outcome, noting not only the performance improvements but also the peace of mind that came with knowing their site was better protected against malicious bot traffic.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Ongoing_Management\"><\/span><strong>Ongoing Management<\/strong><strong><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>We understand that the landscape of web security is always evolving. To maintain the effectiveness of the solution, we continue to monitor the client\u2019s website for any new bot threats.<\/p>\n\n\n\n<p>Our approach is designed to be proactive rather than reactive, ensuring that the client\u2019s site remains secure and performs optimally, even as new threats arise.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Client_Success\"><\/span><strong>Client Success<\/strong><strong><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/image-9-1024x536.png\" alt=\"\" class=\"wp-image-2021666\" srcset=\"https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/image-9-1024x536.png 1024w, https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/image-9-300x157.png 300w, https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/image-9-768x402.png 768w, https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/11\/image-9.png 1200w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Best_Practices\"><\/span><strong>Best Practices<\/strong><strong><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>For other website owners looking to manage crawler traffic effectively, here are some tips:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Monitor Traffic Regularly<\/strong>: Keep an eye on your website\u2019s traffic patterns to quickly identify any unusual spikes that might indicate malicious bot activity.<\/li>\n\n\n\n<li><strong>Use Firewalls and Bot Management Tools<\/strong>: Implement security tools that can help you detect and block harmful bots while allowing beneficial crawlers.<\/li>\n\n\n\n<li><strong>Whitelist Legitimate Bots<\/strong>: Ensure that search engines and other important bots can still access your site by creating and maintaining a whitelist.<\/li>\n\n\n\n<li><strong>Implement Rate Limiting<\/strong>: Prevent any single bot from overwhelming your server by limiting the number of requests it can make in a given time period.<\/li>\n\n\n\n<li><strong>Stay Updated on Bot Trends<\/strong>: The tactics used by malicious bots are constantly evolving, so stay informed about new threats and adjust your security measures accordingly.<\/li>\n<\/ol>\n\n\n\n<p>By following these best practices, you can better manage bot traffic and protect your website from the negative impacts of malicious crawlers. Our experience in this case underscores the importance of a smart, <a href=\"https:\/\/wpx.net\/page\/secure\" data-type=\"link\" data-id=\"https:\/\/wpx.net\/page\/secure\">proactive approach to web security<\/a> in today\u2019s digital landscape.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Web crawlers are essential to the internet ecosystem, powering everything from search engine indexing to web analytics. However, not all bots are beneficial. Malicious crawlers can wreak havoc on websites, consuming bandwidth, reducing performance, and even posing security threats. In this article, we explore how we, a leading managed WordPress hosting provider, tackled a client&#8217;s [&hellip;]<\/p>\n","protected":false},"author":32,"featured_media":2021668,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"cybocfi_hide_featured_image":"","footnotes":""},"categories":[89,102],"tags":[188,179],"ppma_author":[182],"class_list":["post-2021661","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-security","category-the-wpx-way","tag-bot-traffic","tag-case-study"],"blocksy_meta":[],"authors":[{"term_id":182,"user_id":32,"is_guest":0,"slug":"evtim-todorov","display_name":"Evtim Todorov","avatar_url":{"url":"https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/10\/2848CF4F-0893-4631-A17D-EF1AD1AD8C14.jpeg","url2x":"https:\/\/wpx.net\/blog\/wp-content\/uploads\/2024\/10\/2848CF4F-0893-4631-A17D-EF1AD1AD8C14.jpeg"},"0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/wpx.net\/blog\/wp-json\/wp\/v2\/posts\/2021661","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wpx.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wpx.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wpx.net\/blog\/wp-json\/wp\/v2\/users\/32"}],"replies":[{"embeddable":true,"href":"https:\/\/wpx.net\/blog\/wp-json\/wp\/v2\/comments?post=2021661"}],"version-history":[{"count":4,"href":"https:\/\/wpx.net\/blog\/wp-json\/wp\/v2\/posts\/2021661\/revisions"}],"predecessor-version":[{"id":2021674,"href":"https:\/\/wpx.net\/blog\/wp-json\/wp\/v2\/posts\/2021661\/revisions\/2021674"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wpx.net\/blog\/wp-json\/wp\/v2\/media\/2021668"}],"wp:attachment":[{"href":"https:\/\/wpx.net\/blog\/wp-json\/wp\/v2\/media?parent=2021661"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wpx.net\/blog\/wp-json\/wp\/v2\/categories?post=2021661"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wpx.net\/blog\/wp-json\/wp\/v2\/tags?post=2021661"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/wpx.net\/blog\/wp-json\/wp\/v2\/ppma_author?post=2021661"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}