Alibaba Cloud Elasticsearch provides a plug-in named aliyun-timestream for storage and usage enhancement of time series data. This plug-in allows you to use APIs to create, delete, modify, and query time series indexes, and write data to and query data in time series indexes. This topic describes how to use APIs supported by aliyun-timestream to perform the preceding operations.
Background information
aliyun-timestream is a plug-in developed by the Alibaba Cloud Elasticsearch team based on the features of time series products that are provided by the Elastic community. This plug-in is used to enhance the storage and usage performance of time series data. aliyun-timestream uses Prometheus Querying Language (PromQL) statements instead of domain-specific language (DSL) statements to query stored metric data. This helps simplify query operations and improve query efficiency. aliyun-timestream also reduces storage costs. For more information, see Overview of aliyun-timestream.
Prerequisites
An Elasticsearch cluster of the Standard Edition that meets the following version requirements is created: The version of the cluster is V7.16 or later and the kernel version of the cluster is V1.7.0 or later, or the version of the cluster is V7.10 and the kernel version of the cluster is V1.8.0 or later. For information about how to create an Elasticsearch cluster, see Create an Alibaba Cloud Elasticsearch cluster.
Create a time series index
Request syntax
- No content specified in the request body
PUT _time_stream/{name}
- Custom template uploaded to the request body
PUT _time_stream/{name} { --- index template --- }
Usage notes
When you create a time series index, you do not need to configure an index pattern. However, you must specify a specific name for the index. Wildcards are not supported for the name.
You can leave the request body empty or upload a custom template to the request body. For information about the format of the request body, see Index templates in the documentation for open source Elasticsearch.
Examples
- Sample request
PUT _time_stream/test_stream { "template": { "settings": { "index.number_of_shards": "10" } } }
- Sample response
{ "acknowledged" : true }
Update the configurations of a time series index
Request syntax
- No content specified in the request body
POST _time_stream/{name}/_update
- Custom template uploaded to the request body
POST _time_stream/{name}/_update { --- index template --- }
Usage notes
The request body that is passed for the API used to update the configurations of a time series index is the same as the request body that is passed for the API used to create a time series index. For more information, see the Create a time series index section in this topic.
After you update the configurations of a time series index, the new configurations do not immediately take effect on the index. You must roll over the time series index for the new configurations to take effect.
Examples
- Sample request
POST _time_stream/test_stream/_update { "template": { "settings": { "index.number_of_shards": "10" } } }
- Sample response
{ "acknowledged" : true }
Delete a time series index
Request syntax
Delete _time_stream/{name}
Usage notes
Examples
- Sample request
DELETE _time_stream/test_stream
- Sample response
{ "acknowledged" : true }
Query time series indexes
Request syntax
- Query all time series indexes
GET _time_stream
- Query specific time series indexes
GET _time_stream/{name}
Usage notes
You can perform a fuzzy match to search for time series indexes that you want to query. You can also specify the names of time series indexes that you want to query and separate the names with commas (,) to search for the indexes.
Examples
- Sample request
GET _time_stream
- Sample response
{ "time_streams" : { "test_stream" : { "name" : "test_stream", "datastream_name" : "test_stream", "template_name" : ".timestream_test_stream", "template" : { "index_patterns" : [ "test_stream" ], "template" : { "settings" : { "index" : { "number_of_shards" : "10" } } }, "composed_of" : [ ".system.timestream.template" ], "data_stream" : { "hidden" : true } } } } }
Query metrics of time series indexes
Request syntax
- Query metrics of all time series indexes
GET _time_stream/_stats
- Query metrics of specific time series indexes
GET _time_stream/{name}/_stats
Usage notes
You can call the API that is used to query the metrics of time series indexes to obtain information about the metrics such as time_stream_count. The value of the time_stream_count metric indicates the number of time series.
- Calculation method
- The time_stream_count metric collects the number of time series of each primary shard for an index. Each primary shard has different time series. The total number of time series of an index is the sum of the numbers of time series of all primary shards for the index.
- The time_stream_count metric returns the name of the index that has the largest number of time series.
- Precautions
The time_stream_count metric collects the number of time series of each primary shard from the doc values of the _tsid field that specifies the ID of a time series. This process generates excessively high query costs. To reduce the costs, Elasticsearch allows you to configure a caching policy. After you configure such a policy for a read-only index, the time_stream_count metric collects the number of time series of each primary shard for the index only once. By default, the system refreshes the cache at an interval of 5 minutes for other types of indexes. You can configure the index.time_series.stats.refresh_interval parameter for the indexes to change the interval. The minimum interval is 1 minute.
Examples
- Sample request
GET _time_stream/_stats
- Sample response
{ "_shards" : { "total" : 4, "successful" : 4, "failed" : 0 }, "time_stream_count" : 2, "indices_count" : 2, "total_store_size_bytes" : 1278822, "time_streams" : [ { "time_stream" : "test_stream", "indices_count" : 1, "store_size_bytes" : 31235, "tsidCount" : 1 }, { "time_stream" : "prom_index", "indices_count" : 1, "store_size_bytes" : 1247587, "tsidCount" : 317 } ] }
Write time series data to a time series index
Request syntax
Data write model
Field | Description |
labels | The properties that are related to metrics. The field uniquely marks the metadata of a data record that is written. The ID of a time series can be generated by the setting of the field. |
metrics | The metrics. The values of the field must be of the LONG or DOUBLE data type. |
@timestamp | The time when the metric data is collected. The default value of the field is a timestamp in milliseconds. |
{
"labels": {
"namespce": "cn-hanzhou",
"clusterId": "cn-xxx-xxxxxx",
"nodeId": "node-xxx",
"label": "test-cluster",
"disk_type": "cloud_ssd",
"cluster_type": "normal"
},
"metrics": {
"cpu.idle": 10.0,
"mem.free": 100.1,
"disk_ioutil": 5.2
},
"@timestamp": 1624873606000
}
Examples
- Sample request
POST test_stream/_doc { "labels": { "namespce": "cn-hanzhou", "clusterId": "cn-xxx-xxxxxx", "nodeId": "node-xxx", "label": "test-cluster", "disk_type": "cloud_ssd", "cluster_type": "normal" }, "metrics": { "cpu.idle": 10, "mem.free": 100.1, "disk_ioutil": 5.2 }, "@timestamp": 1624873606000 }
- Sample response
{ "_index" : ".ds-test_stream-2021.09.03-000001", "_id" : "suF_qnsBGKH6s8C_OuFS", "_version" : 1, "result" : "created", "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1 }
Configure fields for the time series data model
*
) to perform a fuzzy match. The following code provides examples on how to configure dimension fields and metric fields:- Upload a single custom dimension field or metric field
PUT _time_stream/{name} { --- index template --- "time_stream": { "labels_fields": "@label.*", "metrics_fields": "@metrics.*" } }
- Upload multiple custom dimension fields or metric fields
PUT _time_stream/{name} { --- index template --- "time_stream": { "labels_fields": ["label.*", "dim*"], "metrics_fields": ["@metrics.*", "metrics.*"] } }
Parameter | Description |
labels_fields | Optional. Default value: label.*. |
metrics_fields | Optional. Default value: metrics.*. |
Query data in a time series index
Request syntax
The aliyun-timestream plug-in uses the APIs provided by open source Elasticsearch, such as the search APIs and get API, to query data in a time series index.
Examples
- Sample request
GET test_stream/_search
- Sample response
{ "took" : 172, "timed_out" : false, "_shards" : { "total" : 10, "successful" : 10, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : ".ds-test_stream-2021.09.03-000001", "_id" : "suF_qnsBGKH6s8C_OuFS", "_score" : 1.0 } ] } }
Usage notes for downsampling
Downsampling is a feature that is commonly used in time series scenarios. When you create a time series index, you can configure a downsampling rule for the index. After you configure a downsampling rule for the time series index, you need to only read data from or write data to the index, and the index automatically performs downsampling on the data in the index. When you query data in the time series index, the index automatically determines the scope of the downsampled data that needs to be queried based on the value of the interval parameter configured for aggregation.
When you configure a downsampling rule for a time series index, you need to configure only the interval parameter. The time series index automatically performs downsampling on data based on the configurations of the labels and metrics fields. After the downsampling, the data type of the values of the metrics field is changed to aggregate_metric_double, and the system generates the following sub-fields for the metrics field: max, min, sum, and count.
Downsampling rules are triggered during the rollover stage. After the downsampling rules are triggered, downsampling is performed on the indexes to which data is no longer written. The system generates a downsampling index for each original index based on the downsampling rules. By default, each downsampling index inherits the settings of the related original index. If you want to customize the settings of a downsampling index, you can configure the settings in the related downsampling rule. For example, if you want to reduce the capacity of a downsampling index, you can reduce the number of primary shards for the index. If you want a downsampling index to be stored for a longer period of time, you can configure an index lifecycle management (ILM) policy for the index.
PUT _time_stream/{name}
{
"time_stream": {
"downsample": [
{
"interval": "1m",
"settings": {
"index.lifecycle.name": "my-rollup-ilm-policy_60m",
"index.number_of_shards": "1"
}
},
{
"interval": "10m"
}
]
}
}
Parameter | Required | Description |
interval | Yes | The interval at which downsampling is performed. During the downsampling, data is rolled up at the interval specified by this parameter. You can specify a maximum of five intervals. If you specify more than one interval, you must make sure that the intervals are multiples. For example, you can specify 1m, 10m, and 60m. |
settings | No | The settings of a downsampling index, such as the settings related to the lifecycle and the number of primary shards. |