The bot management module of Web Application Firewall (WAF) provides the scenario-specific configuration feature to protect your business from malicious crawlers. You can configure custom anti-crawler rules based on your business requirements.

Background information

Malicious crawlers have various types. Crawling methods keep changing to bypass anti-crawler rules that are configured by website administrators. Therefore, fixed rules cannot block all malicious crawlers. The methods that are used to block malicious crawlers vary based on the characteristics of the services. Security experts are also required to deliver optimal protection.

If you need strong protection against malicious crawlers or do not have security experts to configure anti-crawler rules, we recommend that you use the scenario-specific configuration feature that is provided by WAF.

WAF provides IP address libraries of malicious crawlers and updates the IP address libraries of various public clouds and data centers in real time based on network-wide threat intelligence of Alibaba Cloud. You can configure the scenario-specific configuration feature to allow normal crawler requests and block malicious crawler requests from the addresses in the IP address libraries.

Risks and characteristics of malicious crawlers

Normal crawler requests contain the xxspider keyword in the User-Agent field and have the following characteristics: low request rate, scattered URLs, and wide time range. To obtain the source IP address that initiates a crawler request, run a reverse nslookup or tracert command on the crawler request. For example, if you run the reverse nslookup command with the IP address of the Baidu crawler, you can obtain the source IP address of the crawler.View the information about origin servers

Malicious crawlers may send a large number of requests to a specific URL or port of a domain name during a specific period of time. For example, HTTP flood attacks are disguised as crawlers or as requests from third parties to crawl sensitive information. A large number of malicious requests can cause increased CPU utilization, website access failures, and service interruptions.

Prerequisites

A WAF instance that runs the Pro, Business, or Enterprise edition is purchased. The bot management module is enabled for your instance.

Limits

You can add up to 50 scenario-specific configuration rules for each domain name.

References

Configure anti-crawler rules for websites

Configure anti-crawler rules for apps

Examples of using the scenario-specific configuration feature