Bot management can be configured in Smart Mode and Professional Mode. In the smart mode, you can configure crawler management for your website. In the professional mode, you can configure more precise crawler rules to suit your website or application.
Precautions
Requests that are blocked by the Bot Management rule are not subject to billing or quota consumption.
Smart mode
Designed for entry-level users, the smart mode allows you to manage automated traffic and crawlers. Compared with the professional mode supported in the Enterprise plan, smart mode helps you to fast manage crawlers by selecting actions to handle a specific type of crawler.
Procedure
Log on to the ESA console.
In the left-side navigation pane, click Websites.
On the Websites page, find the website that you want to manage, and click the website name or View Details in the Actions column.
In the left-side navigation tree, choose Security > Bots.
On the Bots page, click the Smart Mode tab. Then, click Configure or Enable for different items. For information about how to configure the mode, see Description.
Description
Item | Description |
Definite Bots | Requests that are Definite Bots contain a large number of malicious crawler requests. It is recommended to block the request or initiate a slider challenge. |
Likely Bots | Likely Bots requests are relatively low risk and may contain malicious crawlers and other traffic. It is recommended to observe or do slider challenges when there is risk. |
Verified Bots | Verified Bots are usually crawlers for search engines, which are good for SEO optimization of your website. We recommend that you block traffic if you do not want any search engine crawlers to access your website. |
Likely Human (Actions are not supported) | The requests are likely from real users. We recommend that you do not take special actions. |
Static Resource Protection | By default, Definite Bots, Likely Bots, and Verified Bots take effect only for dynamic resource requests. These requests will be accelerated to access your origin server. After you enable Static Resource Protection, the configurations take effect for requests that hit the ESA cache, which are usually static files such as images and videos. |
JavaScript Detection | ESA uses lightweight and background JavaScript detections to improve bot management. Note Only browser traffic can pass JavaScript detection challenges. If your business involves non-browser traffic, such as requests from data centers, disable this feature to avoid false positives. |
Professional mode
You can use the professional mode to provide anti-crawler features for web pages in browsers or apps developed based on iOS or Android. You can create different anti-crawler rules for requests that have different characteristics. You can also use the built-in crawler libraries such as search engine crawler library, AI protection, bot threat intelligence library, data center blacklist, and fake crawler list. This frees you from manual updates and analysis of crawler characteristics.
Procedure
Log on to the ESA console.
In the left-side navigation pane, click Websites.
On the Websites page, find the website that you want to manage, and click the website name or View Details in the Actions column.
In the left-side navigation tree, choose Security > Bots.
On the Bots page, select Professional Mode and click Create Ruleset.
Configure anti-crawler rules for websites
If your web pages, HTML5 pages, or HTML5 apps are accessible from browsers, you can configure anti-crawler rules for the websites to protect your services from malicious crawlers.
Section | Item | Description |
Global Settings | Rule Set Name | The name of the rule set. The name can contain letters, digits, and underscores (_). |
Service Type | Select Browsers. This way, the rule set applies to web pages, HTML5 pages, and HTML5 apps. | |
SDK Integration |
| |
Cross-origin Request | If you select Automatic Integration (Recommended), for multiple websites and these websites can access each other, you must select a different domain for this parameter to prevent duplicate JavaScript code. For example, if you log on to the Website A from Website B, you need to specify the domain name of Website B for this parameter. | |
If requests match... | Specify the conditions for matching incoming requests. For more information, see WAF. | |
Then execute... | Legitimate Bot Management | The search engine crawler whitelist contains the crawler IP addresses of major search engines, including Google, Baidu, Sogou, 360, Bing, and Yandex. The whitelist is dynamically updated. After you select a search engine spider whitelist, requests sent from the crawler IP addresses of the search engines are allowed. The bot management module no longer checks the requests. |
Bot Characteristic Detection |
| |
Bot Behavior Detection | After you enable AI protection, the intelligent protection engine analyzes access traffic and performs machine learning. Then, a blacklist or a protection rule is generated based on the analysis results and learned patterns.
| |
Custom Throttling |
| |
Bot Threat Intelligence Library | The library contains the IP addresses of attackers that have sent multiple requests to crawl content from Alibaba Cloud users over a specific period of time. You can set Action to Monitor or Slider CAPTCHA. | |
Data Center Blacklist | After you enable this feature, the IP addresses in the selected IP address libraries of data centers are blocked. If you use the source IP addresses of public clouds or data centers to access the website that you want to protect, you must add the IP addresses to the whitelist. For example, you must add the callback IP addresses of Alipay or WeChat and the IP addresses of monitoring applications to the whitelist. The data center blacklist supports the following IP address libraries: IP Address Library of Data Center-Alibaba Cloud, IP Address Library of Data Center-21Vianet, IP Address Library of Data Center-Meituan Open Services, IP Address Library of Data Center-Tencent Cloud, and IP Address Library of Data Center-Other. You can set the Action parameter to Monitor, Slider CAPTCHA, or Block. | |
Fake Spider Blocking | After you enable this feature, WAF blocks the User-Agent headers that are used by all search engines specified in the Legitimate Bot Management section. If the IP addresses of clients that access the search engines are proved to be valid, WAF allows requests from the search engines. | |
Effective Time | By default, rules take effect immediately and permanently after they are created. You can configure specific time ranges or cycles in which rules take effect. |
Configure anti-crawler rules for apps
You can configure anti-crawler rules for your native iOS or Android apps to protect your services against crawlers. HTML5 apps are not native iOS or Android apps.
Section | Item | Description |
Global Settings | Rule Set | The name of the rule set. The name can contain letters, digits, and underscores (_). |
Service Type | Select Apps to configure anti-crawler rules for native iOS and Android apps. HTML5 apps are not native iOS or Android apps. | |
SDK Integration | To obtain the SDK package, click Obtain and Copy AppKey and then submit a ticket. For more information, see Integrate the Anti-Bot SDK into Android apps and Integrate the Anti-Bot SDK into iOS apps. After the Anti-Bot SDK is integrated, the Anti-Bot SDK collects the risk characteristics of clients and generates security signatures in requests. WAF identifies and blocks requests that are identified as unsafe based on the signatures. | |
If requests match... | Specify the conditions for matching incoming requests. For more information, see WAF. | |
Then execute... | Bot Characteristic Detection |
|
Bot Throttling |
| |
Bot Threat Intelligence Library | The library contains the IP addresses of attackers that have sent multiple requests to crawl content from Alibaba Cloud users over a specific period of time. | |
Data Center Blacklist | After you enable this feature, the IP addresses in the selected IP address libraries of data centers are blocked. If you use the source IP addresses of public clouds or data centers to access the website that you want to protect, you must add the IP addresses to the whitelist. For example, you must add the callback IP addresses of Alipay or WeChat and the IP addresses of monitoring applications to the whitelist. The data center blacklist supports the following IP address libraries: IP Address Library of Data Center-Alibaba Cloud, IP Address Library of Data Center-21Vianet, IP Address Library of Data Center-Meituan Open Services, IP Address Library of Data Center-Tencent Cloud, and IP Address Library of Data Center-Other. | |
Effective Time | By default, rules take effect immediately and permanently after they are created. You can configure specific time ranges or cycles in which rules take effect. |
Feature availability
Smart mode
Item | Entrance | Pro | Premium | Enterprise |
Definite Bots | Yes (Actions only support Monitor and Allow) | Yes (Actions only support Monitor and Allow) | Yes | Yes |
Likely Bots | Yes (Actions only support Monitor and Allow) | Yes (Actions only support Monitor and Allow) | Yes | Yes |
Verified Bots | Not supported | Not supported | Yes | Yes |
Static Resource Protection | Not supported | Not supported | Not supported | Yes |
JavaScript Detection | Not supported | Not supported | Not supported | Yes |
Professional mode
Feature | Entrance | Pro | Premium | Enterprise |
Number of Bot management rule sets | Not supported | Not supported | Not supported | 10 |