Enable automatic creation of snapshots and store the snapshots to an OSS repository - Elasticsearch

Alibaba Cloud Elasticsearch allows you to use snapshots to automate data backup for Elasticsearch clusters. For an Elasticsearch cluster of V7.6 or later, you can configure a snapshot lifecycle management (SLM) policy to enable automatic creation of snapshots for the cluster. For an Elasticsearch cluster of a version earlier than V7.6, you must create a scheduled task on the client that is used to access the cluster to enable automatic creation of snapshots for the cluster. This topic describes how to enable automatic creation of snapshots for an Alibaba Cloud Elasticsearch cluster and store the snapshots to an Alibaba Cloud Object Storage Service (OSS) repository.

Background information

For more information about SLM, see Snapshot Lifecycle Management.
The data backup and restoration of Alibaba Cloud Elasticsearch clusters depend on the elasticsearch-repository-oss plug-in. The plug-in is installed on Alibaba Cloud Elasticsearch clusters by default and cannot be removed. For more information about the plug-in, see elasticsearch-repository-oss.

Prerequisites

OSS is activated, and an OSS bucket is created. For more information, see Activate OSS and Create a bucket.
Important
The storage class of the OSS bucket must be Standard, and the access control list (ACL) of the bucket must be Public Read. Elasticsearch does not support OSS buckets of the Archive storage class. In addition, the OSS bucket must reside in the same region as the Elasticsearch cluster.
A repository that is used to store the created automatic snapshots is created. For more information, see Create a repository.
Important
Before you restore data from a snapshot to a cluster, you must create a repository in the cluster and map the repository to the same OSS endpoint as the snapshot.

Procedure

Elasticsearch cluster of V7.6 or later

Configure an SLM policy for the Elasticsearch cluster.

PUT _slm/policy/auto-snapshots
{
  "schedule": "0 0 0/12 * * ?",
  "name": "<auto-snap-{now/d}>",
  "repository": "my_auto_backup",
  "config": {
    "indices": "*",
    "include_global_state": true
  },
  "retention": {
    "expire_after": "30d",
    "min_count": 5,
    "max_count": 50
  }
}

Parameter	Description
schedule	The interval at which snapshots are automatically created. The value of this parameter is similar to a cron expression that is used in the Linux operating system. The parts in the value of this parameter correspond to the second, minute, hour, day, month, week, and year. The year is optional. For example, the value `"0 0 0/12 * * ?"` indicates that automatic snapshots are created at intervals of 12 hours.
name	The naming format of an automatic snapshot.
repository	The name of the repository that is used to store the automatic snapshots. For information about how to obtain the name of a repository, see Query repository information.
config	The configuration information of the automatic snapshots. indices: The data and indexes for which automatic snapshots are created. An asterisk (`*`) indicates that automatic snapshots are created for all data and indexes in the Elasticsearch cluster. include_global_state: specifies whether to create automatic snapshots for the status information of the Elasticsearch cluster and features. Valid values: true and false. The value true indicates that automatic snapshots are created for the information. The value false indicates that automatic snapshots are not created for the information.
retention	The retention policy for the automatic snapshots. The configurations in the preceding code indicate that 5 to 50 automatic snapshots can be retained, and the retention period is 30 days. The following parameters are configured: expire_after: specifies the retention period for automatic snapshots. min_count: specifies the minimum number of automatic snapshots that can be retained. The min_count parameter has a higher priority than the expire_after parameter. max_count: specifies the maximum number of automatic snapshots that can be retained. The max_count parameter has a higher priority than the expire_after parameter.

Run the following command to run the SLM policy:
```
POST _slm/policy/auto-snapshots/_execute
```
After the command is successfully run, Alibaba Cloud Elasticsearch automatically creates snapshots at an interval of 12 hours. You can also run the following command to immediately run the retention policy that you configured:
```
POST _slm/_execute_retention
```

Elasticsearch cluster of a version earlier than V7.6

Configure a client that you want to use to access the Elasticsearch cluster.
For more information, see Use a client to access an Alibaba Cloud Elasticsearch cluster.
Create and run a scheduled task for the Elasticsearch cluster.
In this example, a crontab scheduled task is created and the following operations are performed on the client:
1. Create a script.
```
vi /root/snapshot.sh
```
2. Add the cURL command that is used to access the automatic snapshots created for the Elasticsearch cluster to the script. Then, save the script.
```
curl -u elastic:***** -X PUT http://es-*****.public.elasticsearch.aliyuncs.com:9200/_snapshot/my_auto_backup_crontab/snapshot_$(date +%s)
```
3. Grant the execute permissions to the script.
```
chmod +x /root/snapshot.sh
```
4. Configure the crontab scheduled task and specify the time at which the script is run. In this example, the script is run at 02:00 every day.
```
crontab -e
0 2 * * * /bin/bash /root/snapshot.sh
```
5. Restart the crontab scheduled task.
```
sudo /usr/sbin/cron restart
```

After the automatic snapshots are created and stored to a repository, you can perform operations such as deleting the snapshots, restoring data from the snapshots, and querying restoration information. For more information, see Create manual snapshots and restore data from manual snapshots.