Cross-cluster replication (CCR) issues — such as follower indexes falling behind or read requests stalling — can be hard to catch without automated monitoring. X-Pack Watcher on Alibaba Cloud Elasticsearch lets you define watches that query CCR metrics on a schedule and send DingTalk notifications when anomalies are detected. This topic walks you through setting up a watch that alerts on two CCR health signals: read-request latency and leader-follower checkpoint lag.
How it works
A watch runs four components in sequence:
| Component | Role |
|---|---|
| trigger | Defines when the watch runs (for example, every 10 seconds) |
| input | Queries data to evaluate — in this case, CCR stats from .monitoring-es* indexes |
| condition | Decides whether to fire actions — alerts only when anomalies are detected |
| transform | Preprocesses the payload before actions run — extracts affected index names |
| actions | What happens when the condition is met — writes to an index and sends a DingTalk notification |
Because X-Pack Watcher cannot access the internet directly, this topic uses an NGINX proxy on an Elastic Compute Service (ECS) instance to forward webhook requests to DingTalk.
Prerequisites
Before you begin, make sure you have:
An Alibaba Cloud Elasticsearch cluster. See Create an Alibaba Cloud Elasticsearch cluster.
NoteNetwork architecture affects how X-Pack Watcher routes traffic: - Original network architecture: X-Pack Watcher is available only for single-zone Elasticsearch clusters. - New network architecture: Configure a private connection for the Elasticsearch cluster. See Configure a private connection for an Elasticsearch cluster. For details on the network architecture differences, see \[Notice\] Network architecture adjustment.
X-Pack Watcher enabled on the Elasticsearch cluster. It is disabled by default. See Configure the YML file.
An ECS instance in your virtual private cloud (VPC). See Create an instance by using the wizard.
NoteX-Pack Watcher cannot access the internet directly — it must route through the internal endpoint of the Elasticsearch cluster. To forward requests to DingTalk, use the ECS instance as a proxy by enabling source network address translation (SNAT) or associating an elastic IP address (EIP) with it. See Associate an EIP or Configure SNAT.
Step 1: Configure a DingTalk chatbot
Create a DingTalk group to receive alert notifications.
Click the
icon in the upper-right corner of the chat window. In the Group Settings panel, click Bot.In the Robot Management panel, click Add Robot.
In the Robot dialog box, click Add Robot.
Click the Custom card. In the Robot details dialog box, click Add.
In the Add Robot dialog box, select Custom Keywords for Security Settings, then enter one or more keywords.
ImportantThe keywords you enter must appear in the alert message body configured in Step 3. If they do not match, DingTalk will not deliver the notification.
Read and agree to the terms of service, then click Finished.
Click Copy next to the Webhook URL and save it for later use.
ImportantKeep the webhook URL confidential. If it is leaked, unauthorized parties can send messages to your DingTalk group.
Step 2: Set up the NGINX proxy and security group rule
Configure NGINX on the ECS instance
X-Pack Watcher sends alert notifications to the NGINX proxy, which forwards them to DingTalk.
Install NGINX on the ECS instance.
In
nginx.conf, replace theserverblock with the following configuration:server { listen 8080;# The listening port. server_name localhost;# The domain name. index index.html index.htm index.php; root /usr/local/webserver/nginx/html;# The website directory. location ~ .*\.(php|php5)?$ { #fastcgi_pass unix:/tmp/php-cgi.sock; fastcgi_pass 127.0.0.1:9000; fastcgi_index index.php; include fastcgi.conf; } location ~ .*\.(gif|jpg|jpeg|png|bmp|swf|ico)$ { expires 30d; # access_log off; } location / { proxy_pass <Webhook URL of the DingTalk chatbot>; } location ~ .*\.(js|css)?$ { expires 15d; # access_log off; } access_log off; }Replace
<Webhook URL of the DingTalk chatbot>with the webhook URL you copied in Step 1.Reload the NGINX configuration and restart NGINX:
/usr/local/webserver/nginx/sbin/nginx -s reload # Reload the NGINX configuration file. /usr/local/webserver/nginx/sbin/nginx -s reopen # Restart NGINX.
Add an inbound security group rule
Add a security group rule so the ECS instance can accept requests from the Elasticsearch cluster nodes.
Log on to the ECS console.
In the left-side navigation pane, choose Instances & Images > Instances.
On the Instances page, click the name of the ECS instance.
Click the Security Groups tab, then click the name of the security group.
On the Inbound tab of the Access Rule section, click Add Rule.
Configure the following parameters:
Parameter Value Action Allow Priority Default Protocol type Custom TCP Port range 8080 (or the port you configured in NGINX) Authorization object IP addresses of all Elasticsearch cluster nodes. See View the basic information of nodes. Description Optional description Click Save.
Step 3: Create a watch
Log on to the Kibana console of your Elasticsearch cluster. See Log on to the Kibana console.
NoteThis example uses an Elasticsearch V6.7.0 cluster. Steps may vary for other versions.
In the left-side navigation pane, click Dev Tools.
On the Console tab, run the following command:
PUT _watcher/watch/ccr_watcher { "trigger": { "schedule": { "interval": "10s" } }, "input": { "search": { "request": { "indices": [ ".monitoring-es*" ], "body": { "size": 0, "sort": [ { "timestamp": { "order": "desc" } } ], "query": { "bool": { "must": [ { "range": { "timestamp": { "gte": "now-10m" } } }, { "term": { "type": { "value": "ccr_stats" } } }, { "bool": { "should": [ { "range": { "ccr_stats.time_since_last_read_millis": { "gte": 600000 } } }, { "script": { "script": "long gap = doc['ccr_stats.leader_global_checkpoint'].value - doc['ccr_stats.follower_global_checkpoint'].value;\n return gap>1000;" } } ] } } ] } }, "aggs": { "NAME": { "terms": { "field": "ccr_stats.follower_index", "size": 1000 } } } } } } }, "condition": { "compare": { "ctx.payload.hits.total": { "gt": 0 } } }, "transform": { "script": """ StringBuilder message = new StringBuilder(); for (def bucket : ctx.payload.aggregations.NAME.buckets) { message.append(bucket.key).append(' ') } return [ 'delay_indices' : message.toString().trim() ] """ }, "actions" : { "add_index": { "index": { "index": "ccr_delay_indices", "doc_type": "doc" } }, "my_webhook": { "webhook" : { "method" : "POST", "url" : "http://<yourAddress>:8080", "body" : "{\"msgtype\": \"text\", \"text\": { \"content\": \"Please note: {{ctx.payload}}\"}}" } } } }The following table describes the key parameters.
Parameter Description triggerHow often the watch runs. This example checks every 10 seconds. Adjust based on your alerting requirements. input.search.request.indicesThe indexes to query. .monitoring-es*indexes store all Elasticsearch cluster metrics, including CCR stats.input.search.request.bodyQueries CCR stats from the past 10 minutes ( now-10m). An alert fires if either condition is met:ccr_stats.time_since_last_read_millis>= 600,000 ms (10 minutes) — the follower has not received a read request from the leader in over 10 minutes; orccr_stats.leader_global_checkpoint-ccr_stats.follower_global_checkpoint> 1,000 — the checkpoint lag between the leader and follower exceeds 1,000. Adjust these thresholds based on your requirements.conditionActions fire only when the query returns at least one document ( ctx.payload.hits.total> 0).transformLoops through the aggregation buckets and collects the names of affected follower indexes, separated by spaces. actions.add_indexWrites the result to the ccr_delay_indicesindex. Use this index to debug your watch configuration.actions.my_webhookSends a POST request to the proxy address. <yourAddress>The host that receives alert notifications from X-Pack Watcher: if the cluster uses the new network architecture, use the domain name of the private connection endpoint (see Configure a private connection for an Elasticsearch cluster); if the cluster uses the original network architecture, use the IP address of the NGINX proxy or the DingTalk webhook URL directly. bodyMust contain the keywords configured in the DingTalk chatbot's Security Settings. In this example, the keyword noteappears in"Please note: {{ctx.payload}}".NoteIf the command returns
No handler found for uri [/_xpack/watcher/watch/log_error_watch_2] and method [PUT], X-Pack Watcher is disabled. Enable it and run the command again. See Configure the YML file.
Step 4: View alert notifications
When the conditions in Step 3 are met, X-Pack Watcher sends an alert notification to the DingTalk group.

To delete the watch when it is no longer needed, run:
DELETE _xpack/watcher/watch/ccr_watcher