Use the aliyun-timestream plug-in - Elasticsearch - Alibaba Cloud Documentation Center

Alibaba Cloud Elasticsearch provides a plug-in named aliyun-timestream for storage and usage enhancement of time series data. This plug-in allows you to use APIs to create, delete, modify, and query time series indexes, and write data to and query data in time series indexes. This topic describes how to use APIs supported by aliyun-timestream to perform the preceding operations.

Background information

aliyun-timestream is a plug-in developed by the Alibaba Cloud Elasticsearch team based on the features of time series products that are provided by the Elastic community. This plug-in is used to enhance the storage and usage performance of time series data. aliyun-timestream uses Prometheus Querying Language (PromQL) statements instead of domain-specific language (DSL) statements to query stored metric data. This helps simplify query operations and improve query efficiency. aliyun-timestream also reduces storage costs. For more information, see Overview of aliyun-timestream.

Prerequisites

An Elasticsearch cluster that meets the following version requirements is created: The version of the cluster is V7.10 and the kernel version of the cluster is V1.8.0 or later, or the version of the cluster is V7.16 or later and the kernel version of the cluster is V1.7.0 or later. For information about how to create an Elasticsearch cluster, see Create an Alibaba Cloud Elasticsearch cluster.

Create a time series index

Request syntax

No content specified in the request body
```
PUT _time_stream/{name}
```

Custom template uploaded to the request body

PUT _time_stream/{name}
{
  --- index template ---
}

Usage notes

When you create a time series index, you do not need to configure an index pattern. However, you must specify a specific name for the index. Wildcards are not supported for the name.

You can leave the request body empty or upload a custom template to the request body. For information about the format of the request body, see Index templates in the documentation for open source Elasticsearch.

Examples

Sample request

PUT _time_stream/test_stream
{
  "template": {
    "settings": {
      "index.number_of_shards": "10"
    }
  }
}

Sample response
```
{
  "acknowledged" : true
}
```

Update the configurations of a time series index

Request syntax

No content specified in the request body
```
POST _time_stream/{name}/_update
```

Custom template uploaded to the request body

POST _time_stream/{name}/_update
{
  --- index template ---
}

Usage notes

The request body that is passed for the API used to update the configurations of a time series index is the same as the request body that is passed for the API used to create a time series index. For more information, see the Create a time series index section in this topic.

After you update the configurations of a time series index, the new configurations do not immediately take effect on the index. You must roll over the time series index for the new configurations to take effect.

Examples

Sample request

POST _time_stream/test_stream/_update
{
  "template": {
    "settings": {
      "index.number_of_shards": "10"
    }
  }
}

Sample response
```
{
  "acknowledged" : true
}
```

Delete a time series index

Request syntax

Delete _time_stream/{name}

Usage notes

You can perform a fuzzy match to search for multiple time series indexes and delete the indexes at a time. You can also specify the names of time series indexes that you want to delete and separate the names with commas (,) to delete the indexes at a time.

Warning

After you delete a time series index, the data that is stored in the index is also deleted. Before you perform the deletion operation, make sure that the operation does not affect your business.

Examples

Sample request
```
DELETE _time_stream/test_stream
```
Sample response
```
{
  "acknowledged" : true
}
```

Query time series indexes

Request syntax

Query all time series indexes
```
GET _time_stream
```
Query specific time series indexes
```
GET _time_stream/{name}
```

Usage notes

You can perform a fuzzy match to search for time series indexes that you want to query. You can also specify the names of time series indexes that you want to query and separate the names with commas (,) to search for the indexes.

Examples

Sample request
```
GET _time_stream
```

Sample response

{
  "time_streams" : {
    "test_stream" : {
      "name" : "test_stream",
      "datastream_name" : "test_stream",
      "template_name" : ".timestream_test_stream",
      "template" : {
        "index_patterns" : [
          "test_stream"
        ],
        "template" : {
          "settings" : {
            "index" : {
              "number_of_shards" : "10"
            }
          }
        },
        "composed_of" : [
          ".system.timestream.template"
        ],
        "data_stream" : {
          "hidden" : true
        }
      }
    }
  }
}

Query metrics of time series indexes

Request syntax

Query metrics of all time series indexes
```
GET _time_stream/_stats
```
Query metrics of specific time series indexes
```
GET _time_stream/{name}/_stats
```

Usage notes

You can call the API that is used to query the metrics of time series indexes to obtain information about the metrics such as time_stream_count. The value of the time_stream_count metric indicates the number of time series.

Description of the time_stream_count metric:

Calculation method
1. The time_stream_count metric collects the number of time series of each primary shard for an index. Each primary shard has different time series. The total number of time series of an index is the sum of the numbers of time series of all primary shards for the index.
2. The time_stream_count metric returns the name of the index that has the largest number of time series.
Precautions
The time_stream_count metric collects the number of time series of each primary shard from the doc values of the _tsid field that specifies the ID of a time series. This process generates excessively high query costs. To reduce the costs, Elasticsearch allows you to configure a caching policy. After you configure such a policy for a read-only index, the time_stream_count metric collects the number of time series of each primary shard for the index only once. By default, the system refreshes the cache at an interval of 5 minutes for other types of indexes. You can configure the index.time_series.stats.refresh_interval parameter for the indexes to change the interval. The minimum interval is 1 minute.

Examples

Sample request
```
GET _time_stream/_stats
```

Sample response

{
  "_shards" : {
    "total" : 4,
    "successful" : 4,
    "failed" : 0
  },
  "time_stream_count" : 2,
  "indices_count" : 2,
  "total_store_size_bytes" : 1278822,
  "time_streams" : [
    {
      "time_stream" : "test_stream",
      "indices_count" : 1,
      "store_size_bytes" : 31235,
      "tsidCount" : 1
    },
    {
      "time_stream" : "prom_index",
      "indices_count" : 1,
      "store_size_bytes" : 1247587,
      "tsidCount" : 317
    }
  ]
}

Write time series data to a time series index

Request syntax

The aliyun-timestream plug-in uses the APIs provided by open source Elasticsearch, such as the bulk API and index API, to write data to a time series index.

Important

When the aliyun-timestream plug-in uses the APIs to write data to a time series index, the plug-in can only append the data. The plug-in cannot index, update, or delete existing data.

Data write model

When you use the aliyun-timestream plug-in to write data to a time series index, you must make sure that the data meets the requirements of the time series data model. The time series data model contains the default fields that are described in the following table.

Field	Description
labels	The properties that are related to metrics. The field uniquely marks the metadata of a data record that is written. The ID of a time series can be generated by the setting of the field.
metrics	The metrics. The values of the field must be of the LONG or DOUBLE data type.
@timestamp	The time when the metric data is collected. The default value of the field is a timestamp in milliseconds.

Sample code:

{
  "labels": {
    "namespce": "cn-hanzhou",
    "clusterId": "cn-xxx-xxxxxx",
    "nodeId": "node-xxx",
    "label": "test-cluster",
    "disk_type": "cloud_ssd",
    "cluster_type": "normal"
  },
  "metrics": {
    "cpu.idle": 10.0,
    "mem.free": 100.1,
    "disk_ioutil": 5.2
  },
  "@timestamp": 1624873606000
}

Examples

Sample request

POST test_stream/_doc
{
  "labels": {
    "namespce": "cn-hanzhou",
    "clusterId": "cn-xxx-xxxxxx",
    "nodeId": "node-xxx",
    "label": "test-cluster",
    "disk_type": "cloud_ssd",
    "cluster_type": "normal"
  },
  "metrics": {
    "cpu.idle": 10,
    "mem.free": 100.1,
    "disk_ioutil": 5.2
  },
  "@timestamp": 1624873606000
}

Sample response

{
  "_index" : ".ds-test_stream-2021.09.03-000001",
  "_id" : "suF_qnsBGKH6s8C_OuFS",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

Configure fields for the time series data model

When you create a time series index, you can upload one or more custom dimension fields and metric fields. The aliyun-timestream plug-in automatically creates dynamic mappings for dimension fields and metric fields and configures the time_series_dimension parameter for dimension fields. Elasticsearch automatically generates time series IDs based on dimension fields. By default, metric fields store only doc values. When you configure dimension fields and metric fields, you can use a wildcard (*) to perform a fuzzy match. The following code provides examples on how to configure dimension fields and metric fields:

Upload a single custom dimension field or metric field

PUT _time_stream/{name}
{
  --- index template ---
  "time_stream": {
    "labels_fields": "@label.*",
    "metrics_fields": "@metrics.*"
  }
}

Upload multiple custom dimension fields or metric fields

PUT _time_stream/{name}
{
  --- index template ---
  "time_stream": {
    "labels_fields": ["label.*", "dim*"],
    "metrics_fields": ["@metrics.*", "metrics.*"]
  }
}

Parameter	Description
labels_fields	Optional. Default value: label.*.
metrics_fields	Optional. Default value: metrics.*.

Query data in a time series index

Request syntax

The aliyun-timestream plug-in uses the APIs provided by open source Elasticsearch, such as the search APIs and get API, to query data in a time series index.

Examples

Sample request
```
GET test_stream/_search
```

Sample response

{
  "took" : 172,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : ".ds-test_stream-2021.09.03-000001",
        "_id" : "suF_qnsBGKH6s8C_OuFS",
        "_score" : 1.0
      }
    ]
  }
}

Usage notes for downsampling

Downsampling is a feature that is commonly used in time series scenarios. When you create a time series index, you can configure a downsampling rule for the index. After you configure a downsampling rule for the time series index, you need to only read data from or write data to the index, and the index automatically performs downsampling on the data in the index. When you query data in the time series index, the index automatically determines the scope of the downsampled data that needs to be queried based on the value of the interval parameter configured for aggregation.

When you configure a downsampling rule for a time series index, you need to configure only the interval parameter. The time series index automatically performs downsampling on data based on the configurations of the labels and metrics fields. After the downsampling, the data type of the values of the metrics field is changed to aggregate_metric_double, and the system generates the following sub-fields for the metrics field: max, min, sum, and count.

Downsampling rules are triggered during the rollover stage. After the downsampling rules are triggered, downsampling is performed on the indexes to which data is no longer written. The system generates a downsampling index for each original index based on the downsampling rules. By default, each downsampling index inherits the settings of the related original index. If you want to customize the settings of a downsampling index, you can configure the settings in the related downsampling rule. For example, if you want to reduce the capacity of a downsampling index, you can reduce the number of primary shards for the index. If you want a downsampling index to be stored for a longer period of time, you can configure an index lifecycle management (ILM) policy for the index.

The following code provides an example on how to configure a downsampling rule:

PUT _time_stream/{name}
{
  "time_stream": {
    "downsample": [
      {
        "interval": "1m",
        "settings": {
           "index.lifecycle.name": "my-rollup-ilm-policy_60m",
           "index.number_of_shards": "1"
        }
      },
      {
        "interval": "10m"
      }
    ]
  }
}

You can add the downsample parameter to the configurations of the time_stream parameter. Then, you can configure the required parameters in downsample. The following table describes the parameters that can be configured in downsample.

Parameter	Required	Description
interval	Yes	The interval at which downsampling is performed. During the downsampling, data is rolled up at the interval specified by this parameter. You can specify a maximum of five intervals. If you specify more than one interval, you must make sure that the intervals are multiples. For example, you can specify 1m, 10m, and 60m.
settings	No	The settings of a downsampling index, such as the settings related to the lifecycle and the number of primary shards.