All Products
Search
Document Center

:Bots

Last Updated:Sep 24, 2024

Bot management rules can be used to protect your websites or native iOS and Android apps against crawlers. To use the anti-crawler feature on your native iOS and Android apps, you must integrate the Anti-Bot SDK. You can create different anti-crawler rules for requests that have different characteristics. You can also use the built-in crawler libraries such as search engine crawler library, AI protection, bot threat intelligence library, data center blacklist, and fake spider list. This frees you from manual updates and analysis of crawler characteristics.

Create a bot management rule set

  1. Log on to the ESA console.

  2. In the left-side navigation pane, click Websites.

  3. On the Websites page, find the website that you want to manage, and click the website name or View Details in the Actions column.

  4. In the left-side navigation tree, choose Security > Bots.

  5. On the Bots page, click Create Rule Set.

  6. Select Browsers or Apps for the Service Type parameter, configure other parameters as needed, and then click OK. For more information, see Configure anti-crawler rules for websites and Configure anti-crawler rules for apps.

Configure anti-crawler rules for websites

If your web pages, HTML5 pages, or HTML5 apps are accessible from browsers, you can configure anti-crawler rules for the websites to protect your services from malicious crawlers.

Section

Parameter

Description

Global Settings

Rule Set Name

The name of the rule set. The name can contain letters, digits, and underscores (_).

Service Type

Select Browsers. This way, the rule set applies to web pages, HTML5 pages, and HTML5 apps.

SDK Integration

  • Automatic Integration (Recommended):

    WAF automatically references the SDK in the HTML pages of the website and embed JavaScript code. Then, the SDK collects information such as browser information, probe signatures, and malicious behaviors. Sensitive information is not collected. WAF detects and blocks malicious crawlers based on the collected information.

  • Manual Integration

    If automatic integration is not supported, you can use manual integration. Copy the JavaScript code displayed on the web page to your HTML code.

Cross-origin Request

If you select Automatic Integration (Recommended), for multiple websites and these websites can access each other, you must select a different domain for this parameter to prevent duplicate JavaScript code. For example, if you log on to the Website A from Website B, you need to specify the domain name of Website B for this parameter.

If requests match...

Specify the conditions for matching incoming requests. For more information, see WAF.

Then execute...

Legitimate Bot Management

The search engine crawler whitelist contains the crawler IP addresses of major search engines, including Google, Baidu, Sogou, 360, Bing, and Yandex. The whitelist is dynamically updated.

After you select a search engine spider whitelist, requests sent from the crawler IP addresses of the search engines are allowed. The bot management module no longer checks the requests.

Bot Characteristic Detection

  • Script-based Bot Block (JavaScript): If you turn on this switch, WAF performs JavaScript validation on clients. To prevent simple script-based attacks, traffic from non-browser tools that cannot run JavaScript is blocked.

  • Advanced Bot Defense (Dynamic Token-based Authentication): If you turn on this switch, the signature of each request is verified. Requests that fail the verification are blocked. SDK Signature Verification is selected by default and cannot be cleared. Requests that do not contain signatures or requests that contain invalid signatures are detected. You can also select Signature Timestamp Exception and WebDriver Attack.

Bot Behavior Detection

After you enable AI Protection, the intelligent protection engine analyzes access traffic and performs machine learning. Then, a blacklist or a protection rule is generated based on the analysis results and learned patterns.

  • Monitor: The anti-crawler rule allows traffic that matches the rule and records the traffic in security reports.

  • Slider CAPTCHA: Clients must pass slider CAPTCHA verification before the clients can access the website.

Custom Throttling

  • IP Address Throttling (Default): You can configure throttling conditions for IP addresses. If the number of requests from the same IP address within the value specified by Statistical Interval (Seconds) exceeds the value of Threshold (Times), the system performs the specified action on subsequent requests. The action can be specified by selecting Slider CAPTCHA, Block, or Monitor from the Action drop-down list. You can add up to three conditions. The conditions are in an OR relation.

  • Custom Session Throttling: You can configure throttling conditions for sessions. You can configure the Session Type parameter to specify the session type. If the number of requests from the same IP address within the value specified by Statistical Interval (Seconds) exceeds the value of Threshold (Times), WAF performs the specified action on subsequent requests. The action can be specified by selecting Slider CAPTCHA, Block, or Monitor from the Action drop-down list. You can add up to three conditions. The conditions are in an OR relation.

Bot Threat Intelligence Library

The library contains the IP addresses of attackers that have sent multiple requests to crawl content from Alibaba Cloud users over a specific period of time.

You can set Action to Monitor or Slider CAPTCHA.

Data Center Blacklist

After you enable this feature, the IP addresses in the selected IP address libraries of data centers are blocked. If you use the source IP addresses of public clouds or data centers to access the website that you want to protect, you must add the IP addresses to the whitelist. For example, you must add the callback IP addresses of Alipay or WeChat and the IP addresses of monitoring applications to the whitelist. The data center blacklist supports the following IP address libraries: IP Address Library of Data Center-Alibaba Cloud, IP Address Library of Data Center-21Vianet, IP Address Library of Data Center-Meituan Open Services, IP Address Library of Data Center-Tencent Cloud, and IP Address Library of Data Center-Other.

You can set the Action parameter to Monitor, Slider CAPTCHA, or Block.

Fake Spider Blocking

After you enable this feature, WAF blocks the User-Agent headers that are used by all search engines specified in the Legitimate Bot Management section. If the IP addresses of clients that access the search engines are proved to be valid, WAF allows requests from the search engines.

Effective Time

By default, rules take effect immediately and permanently after they are created. You can configure specific time ranges or cycles in which rules take effect.

Configure anti-crawler rules for apps

You can configure anti-crawler rules for your native iOS or Android apps to protect your services against crawlers. HTML5 apps are not native iOS or Android apps.

Section

Parameter

Description

Global Settings

Rule Set Name

The name of the rule set. The name can contain letters, digits, and underscores (_).

Service Type

Select Apps to configure anti-crawler rules for native iOS and Android apps. HTML5 apps are not native iOS or Android apps.

SDK Integration

To obtain the SDK package, click Obtain and Copy AppKey and then submit a ticket. For more information, see Integrate the Anti-Bot SDK into Android apps and Integrate the Anti-Bot SDK into iOS apps. After the Anti-Bot SDK is integrated, the Anti-Bot SDK collects the risk characteristics of clients and generates security signatures in requests. WAF identifies and blocks requests that are identified as unsafe based on the signatures.

If requests match...

Specify the conditions for matching incoming requests. For more information, see WAF.

Then execute...

Bot Characteristic Detection

  • Abnormal Device Behavior: After you enable this feature, the anti-crawler rule detects and controls the requests from the devices that have abnormal characteristics. The following behaviors are considered abnormal:

    • Expired Signature: The signature expires. This behavior is selected by default.

    • Using Simulator: A simulator is used.

    • Using Proxy: A proxy is used.

    • Rooted Device: A rooted device is used.

    • Debugging Mode: The debugging mode is used.

    • Hooking: Hooking techniques are used.

    • Multiboxing: Multiple protected app processes run on the device at the same time.

    • Simulated Execution: User behavior simulation techniques are used.

    • Using Script Tool: An automatic script is used.

  • Custom Signature Field: Turn on this switch and select header, parameter, or cookie from the Field Name drop-down list.

    If the custom signature is empty or has special characters or the length exceeds the limit, you can hash the signature or process the signature by using other methods and enter the processing result in the Value field.

  • Action: You can set this parameter to Monitor or Block.

    • Monitor: triggers alerts and does not block requests.

    • Block: blocks attack requests.

  • Secondary Packaging Detection: Requests that are sent from apps whose package names or signatures are not in the whitelists are considered secondary packaging requests. You can specify valid application packages.

    • Valid Package Name: Enter the valid application package name. Example: example.aliyundoc.com.

    • Signature: Contact Alibaba Cloud technical support to obtain the package signature. This parameter is optional if the package signature does not need to be verified. In this case, the system verifies only the package name.

      Note

      The value of Signature is not the signature of the application certificate.

Bot Throttling

  • IP Address Throttling (Default): You can configure throttling conditions for IP addresses. If the number of requests from the same IP address within the value specified by Statistical Interval (Seconds) exceeds the value of Threshold (Times), the system performs the specified action on subsequent requests. The action can be specified by selecting Block or Monitor from the Action drop-down list.

  • Device Throttling: You can configure throttling conditions for devices. If the number of requests from the same device within the value specified by Statistical Interval (Seconds) exceeds the value of Threshold (Times), the system performs the specified action on subsequent requests. The action can be specified by selecting Block or Monitor from the Action drop-down list.

  • Custom Session Throttling: You can configure throttling conditions for sessions. You can configure the Session Type parameter to specify the session type. If the number of requests from the same session within the value specified by Statistical Interval (Seconds) exceeds the value of Threshold (Times), the system performs the specified action on subsequent requests. The action can be specified by selecting Block or Monitor from the Action drop-down list. You can also configure the Throttling Interval (Seconds) parameter, which specifies the period during which the specified action is performed.

Bot Threat Intelligence Library

The library contains the IP addresses of attackers that have sent multiple requests to crawl content from Alibaba Cloud users over a specific period of time.

Data Center Blacklist

After you enable this feature, the IP addresses in the selected IP address libraries of data centers are blocked. If you use the source IP addresses of public clouds or data centers to access the website that you want to protect, you must add the IP addresses to the whitelist. For example, you must add the callback IP addresses of Alipay or WeChat and the IP addresses of monitoring applications to the whitelist. The data center blacklist supports the following IP address libraries: IP Address Library of Data Center-Alibaba Cloud, IP Address Library of Data Center-21Vianet, IP Address Library of Data Center-Meituan Open Services, IP Address Library of Data Center-Tencent Cloud, and IP Address Library of Data Center-Other.

Effective Time

By default, rules take effect immediately and permanently after they are created. You can configure specific time ranges or cycles in which rules take effect.

Feature availability

Basic

Standard

Advanced

Enterprise

Bot management rule sets

No

No

No

10