All Products
Search
Document Center

Elasticsearch:Use X-Pack Watcher to monitor CCR-related metrics and report alerts for exceptions

Last Updated:Mar 26, 2026

Cross-cluster replication (CCR) issues — such as follower indexes falling behind or read requests stalling — can be hard to catch without automated monitoring. X-Pack Watcher on Alibaba Cloud Elasticsearch lets you define watches that query CCR metrics on a schedule and send DingTalk notifications when anomalies are detected. This topic walks you through setting up a watch that alerts on two CCR health signals: read-request latency and leader-follower checkpoint lag.

How it works

A watch runs four components in sequence:

ComponentRole
triggerDefines when the watch runs (for example, every 10 seconds)
inputQueries data to evaluate — in this case, CCR stats from .monitoring-es* indexes
conditionDecides whether to fire actions — alerts only when anomalies are detected
transformPreprocesses the payload before actions run — extracts affected index names
actionsWhat happens when the condition is met — writes to an index and sends a DingTalk notification

Because X-Pack Watcher cannot access the internet directly, this topic uses an NGINX proxy on an Elastic Compute Service (ECS) instance to forward webhook requests to DingTalk.

Prerequisites

Before you begin, make sure you have:

Step 1: Configure a DingTalk chatbot

  1. Create a DingTalk group to receive alert notifications.

  2. Click the 设置.png icon in the upper-right corner of the chat window. In the Group Settings panel, click Bot.

  3. In the Robot Management panel, click Add Robot.

  4. In the Robot dialog box, click Add Robot.

  5. Click the Custom card. In the Robot details dialog box, click Add.

  6. In the Add Robot dialog box, select Custom Keywords for Security Settings, then enter one or more keywords.

    Important

    The keywords you enter must appear in the alert message body configured in Step 3. If they do not match, DingTalk will not deliver the notification.

  7. Read and agree to the terms of service, then click Finished.

  8. Click Copy next to the Webhook URL and save it for later use.

    Important

    Keep the webhook URL confidential. If it is leaked, unauthorized parties can send messages to your DingTalk group.

Step 2: Set up the NGINX proxy and security group rule

Configure NGINX on the ECS instance

X-Pack Watcher sends alert notifications to the NGINX proxy, which forwards them to DingTalk.

  1. Install NGINX on the ECS instance.

  2. In nginx.conf, replace the server block with the following configuration:

    server
      {
        listen 8080;# The listening port.
        server_name localhost;# The domain name.
        index index.html index.htm index.php;
        root /usr/local/webserver/nginx/html;# The website directory.
          location ~ .*\.(php|php5)?$
        {
          #fastcgi_pass unix:/tmp/php-cgi.sock;
          fastcgi_pass 127.0.0.1:9000;
          fastcgi_index index.php;
          include fastcgi.conf;
        }
        location ~ .*\.(gif|jpg|jpeg|png|bmp|swf|ico)$
        {
          expires 30d;
          # access_log off;
        }
        location / {
          proxy_pass <Webhook URL of the DingTalk chatbot>;
        }
        location ~ .*\.(js|css)?$
        {
          expires 15d;
          # access_log off;
        }
        access_log off;
      }

    Replace <Webhook URL of the DingTalk chatbot> with the webhook URL you copied in Step 1.

  3. Reload the NGINX configuration and restart NGINX:

    /usr/local/webserver/nginx/sbin/nginx -s reload            # Reload the NGINX configuration file.
    /usr/local/webserver/nginx/sbin/nginx -s reopen            # Restart NGINX.

Add an inbound security group rule

Add a security group rule so the ECS instance can accept requests from the Elasticsearch cluster nodes.

  1. Log on to the ECS console.

  2. In the left-side navigation pane, choose Instances & Images > Instances.

  3. On the Instances page, click the name of the ECS instance.

  4. Click the Security Groups tab, then click the name of the security group.

  5. On the Inbound tab of the Access Rule section, click Add Rule.

  6. Configure the following parameters:

    ParameterValue
    ActionAllow
    PriorityDefault
    Protocol typeCustom TCP
    Port range8080 (or the port you configured in NGINX)
    Authorization objectIP addresses of all Elasticsearch cluster nodes. See View the basic information of nodes.
    DescriptionOptional description
  7. Click Save.

Step 3: Create a watch

  1. Log on to the Kibana console of your Elasticsearch cluster. See Log on to the Kibana console.

    Note

    This example uses an Elasticsearch V6.7.0 cluster. Steps may vary for other versions.

  2. In the left-side navigation pane, click Dev Tools.

  3. On the Console tab, run the following command:

    PUT _watcher/watch/ccr_watcher
    {
      "trigger": {
        "schedule": {
          "interval": "10s"
        }
      },
      "input": {
        "search": {
          "request": {
            "indices": [
              ".monitoring-es*"
            ],
            "body": {
              "size": 0,
              "sort": [
                {
                  "timestamp": {
                    "order": "desc"
                  }
                }
              ],
              "query": {
                "bool": {
                  "must": [
                    {
                      "range": {
                        "timestamp": {
                          "gte": "now-10m"
                        }
                      }
                    },
                    {
                      "term": {
                        "type": {
                          "value": "ccr_stats"
                        }
                      }
                    },
                    {
                      "bool": {
                        "should": [
                          {
                            "range": {
                              "ccr_stats.time_since_last_read_millis": {
                                "gte": 600000
                              }
                            }
                          },
                          {
                            "script": {
                              "script": "long gap = doc['ccr_stats.leader_global_checkpoint'].value - doc['ccr_stats.follower_global_checkpoint'].value;\n            return gap>1000;"
                            }
                          }
                        ]
                      }
                    }
                  ]
                }
              },
              "aggs": {
                "NAME": {
                  "terms": {
                    "field": "ccr_stats.follower_index",
                    "size": 1000
                  }
                }
              }
            }
          }
        }
      },
      "condition": {
        "compare": {
          "ctx.payload.hits.total": {
            "gt": 0
          }
        }
      },
      "transform": {
        "script": """
        StringBuilder message = new StringBuilder();
    for (def bucket : ctx.payload.aggregations.NAME.buckets) {
      message.append(bucket.key).append('  ')
    }
        return [ 'delay_indices' : message.toString().trim()  ]
    """
      },
      "actions" : {
         "add_index": {
          "index": {
            "index": "ccr_delay_indices",
            "doc_type": "doc"
          }
        },
         "my_webhook": {
         "webhook" : {
            "method" : "POST",
            "url" : "http://<yourAddress>:8080",
            "body" : "{\"msgtype\": \"text\", \"text\": { \"content\": \"Please note: {{ctx.payload}}\"}}"
          }
        }
      }
    }

    The following table describes the key parameters.

    ParameterDescription
    triggerHow often the watch runs. This example checks every 10 seconds. Adjust based on your alerting requirements.
    input.search.request.indicesThe indexes to query. .monitoring-es* indexes store all Elasticsearch cluster metrics, including CCR stats.
    input.search.request.bodyQueries CCR stats from the past 10 minutes (now-10m). An alert fires if either condition is met: ccr_stats.time_since_last_read_millis >= 600,000 ms (10 minutes) — the follower has not received a read request from the leader in over 10 minutes; or ccr_stats.leader_global_checkpoint - ccr_stats.follower_global_checkpoint > 1,000 — the checkpoint lag between the leader and follower exceeds 1,000. Adjust these thresholds based on your requirements.
    conditionActions fire only when the query returns at least one document (ctx.payload.hits.total > 0).
    transformLoops through the aggregation buckets and collects the names of affected follower indexes, separated by spaces.
    actions.add_indexWrites the result to the ccr_delay_indices index. Use this index to debug your watch configuration.
    actions.my_webhookSends a POST request to the proxy address.
    <yourAddress>The host that receives alert notifications from X-Pack Watcher: if the cluster uses the new network architecture, use the domain name of the private connection endpoint (see Configure a private connection for an Elasticsearch cluster); if the cluster uses the original network architecture, use the IP address of the NGINX proxy or the DingTalk webhook URL directly.
    bodyMust contain the keywords configured in the DingTalk chatbot's Security Settings. In this example, the keyword note appears in "Please note: {{ctx.payload}}".
    Note

    If the command returns No handler found for uri [/_xpack/watcher/watch/log_error_watch_2] and method [PUT], X-Pack Watcher is disabled. Enable it and run the command again. See Configure the YML file.

Step 4: View alert notifications

When the conditions in Step 3 are met, X-Pack Watcher sends an alert notification to the DingTalk group.

查看报警结果

To delete the watch when it is no longer needed, run:

DELETE _xpack/watcher/watch/ccr_watcher