×
Community Blog The Pressure Test of Prometheus on Alibaba Cloud ECS Service Discovery Feature

The Pressure Test of Prometheus on Alibaba Cloud ECS Service Discovery Feature

By Xingji and Yusheng Guo Abstract Test the Prometheus service discovery feature under ECS scaling scenarios.

By Xingji and Yusheng Guo

Abstract

Test the Prometheus service discovery feature under ECS scaling scenarios. This involves normal scaling, alternating scaling, attaching filter tags and no tags, as well as simultaneously configuring multiple filter tags.

The Alibaba Cloud ECS service discovery feature of Prometheus supports both the discovery of instances with or without filter tags, as well as dynamic scaling of ECS counts. It has been proved that the above functions can work normally.

This test conducted a stress test on the service discovery mechanism of Prometheus on Alibaba Cloud ECS. It verified normal functionality in terms of scaling behavior and ECS tag filtering. In terms of resource consumption and load, under an extreme scenario simulating a cluster with about 1000 ECS instances, and with Tags Filtering conditions, as well as after alternating scaling operations, Prometheus itself consumed about 0.2 cores (vCPUs) and 1.4 GiB of memory.

Procedure

1. Environment Benchmark

1.  Deploy and Config Prometheus

The binary executable of Prometheus built on the Linux x86_64 platform.

https://github.com/AliyunContainerService/prometheus/releases/tag/v2.55.0-aliyun-ecs-sd

Create an ECS instance to serve as the running environment for Prometheus and open port 9091 in the security group.

Each ECS instance: CPU and memory: 4 cores (vCPUs) and 16GiB. OS: Alibaba Cloud Linux 3.2104 LTS 64-bit

Download the program to the local machine and transfer it to the remote ECS via SCP command.

Create and edit the configuration file prometheus.yml.

global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 30s
scrape_configs:
  - job_name: _aliyun-prom/ecs-sd
    honor_timestamps: true
    scrape_interval: 30s
    scrape_timeout: 10s
    metrics_path: /metrics
    scheme: http
    ecs_sd_configs:
      - port: 9101
        refresh_interval: 30s
        region_id: cn-qingdao               # Set the region for obtaining ECS instances.
        access_key: <access_key>        # The AccessKey ID of the Alibaba Cloud account.
        access_key_secret: <access_key_secret> # The AccessKey secret of the Alibaba Cloud account.
#        tag_filters:
#          - key: 'testK'
#            values: ['*', 'testV*']
        limit: -1     # The maximum number of instances obtained from the API is limited to 100 by default; when less than zero, all instances are retrieved.

Then run the Prometheus program, which will by default use the configuration file named prometheus.yml in the current directory.

./prometheus

ECS scaling can be performed and managed in two ways: ACK and ESS.

2.  Manange and Scale a Group of ECS Nodes by Using ACK Kubernetes Cluster

a) Create an ACK Kubernetes cluster and use nodepool to scale and manage a group of ECS nodes.

More infomation about create an ACK cluster.

1

b) Deploy node-exporter DaemonSet with hostnetwork: true as ECS's testing exporter.

You can apply kube-prometheus community's node-exporter daemonset (include: ServiceAccount, ClusterRole, ClusterRoleBinding, Daemonsets):

https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/nodeExporter-serviceAccount.yaml
https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/nodeExporter-clusterRole.yaml
https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/nodeExporter-clusterRoleBinding.yaml
https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/nodeExporter-daemonset.yaml

During our load testing, we utilized the already deployed node-exporter in the Alibaba Cloud ACK cluster and opened port 9101.

3.  Change the listening address of node-exporter DaemonSet from 127.0.0.1:9101 to 0.0.0.0:9101 to allow LAN access.

2

4.  Control the number of ECS instances by scaling the node pools.

3

More information about ACK.

Note: Use top -p <pid> to observe CPU and memory usage.

2. Pressure Test with No Filter Tags

a) Normal Scaling

ECS instance count: 5 -> 55 -> 155 -> 55 -> 5

● ECS 5

4
5

● ECS 55

6
7

● ECS 155

8
9

● ECS 55

10
11

● ECS 5

12
13

CPU and memory changes

ECS count 5 55 155 55 5
%CPU 0.0 0.0 1.0 0.3 0.0
%MEM 0.8 1.2 2.1 1.8 2.0

b) Alternating Scaling

ECS instance count: 5 -> 55 -> 45 -> 145 -> 105

● ECS 5

14
15

● ECS 55

16
17

● ECS 45

18
19

● ECS 145

20
21

● ECS 105

22
23

CPU and memory changes

ECS count 5 55 45 145 105
%CPU 0.0 0.0 0.3 0.7 0.7
%MEM 2.0 2.2 2.1 2.7 2.8

c) Large-scale ECS Scaling

ECS instance count: 105 -> 605 -> 1105 -> 605 -> 1

● ECS 105

24
25

● ECS 605

26
27

● ECS 998

28
29

● ECS 605

30
31

● ECS 1

32
33

CPU and memory changes

ECS count 105 605 998 605 1
%CPU 0.7 3.7 6.0 3.3 0.0
%MEM 2.8 6.5 10.8 10.2 11.2

3. Pressure Test with Filter Tags

Set tag filtering

tag_filters:
  - key: 'testK'
    values: ['testV', '*']

Add tags to ECS instances

34

More information about ECS tags.

a. Tag Filtering

ECS instance count: ECS (testK: testV) 5 -> ECS (testK: testV) 5 + ECS 5 -> ECS (testK: testV) 5 + ECS 5 + ECS (testK: abc) 5

● ECS (testK: testV) 5

35
36

● ECS (testK: testV) 5 + ECS 5

37
38

ECS instances without tags cannot be discovered.

● ECS (testK: testV) 5 + ECS 5 + ECS (testK: abc) 5

39
40

ECS instances with tag value abc match the wildcard * and are discovered.

b. Normal Scaling

ECS instance count: ECS (testK:testV) 5 + ECS 5 -> ECS (testK:testV) 55 + ECS 5 -> ECS (testK:testV) 155 + ECS 5 -> ECS (testK:testV) 55 + ECS 5 -> ECS (testK:testV) 5 + ECS 5

● ECS (testK:testV) 5 + ECS 5

41
42

● ECS (testK:testV) 55 + ECS 5

43
44

● ECS (testK:testV) 155 + ECS 5

45
46

● ECS (testK:testV) 55 + ECS 5

47
48

● ECS (testK:testV) 5 + ECS 5

49
50

CPU and memory changes

ECS count 5+5 55+5 155+5 55+5 5+5
%CPU 0.0 0.0 1.3 0.3 0.0
%MEM 0.7 1.1 1.8 1.6 1.8

c. Alternating Scaling

ECS instance count: ECS (testK:testV) 5 + ECS 5 -> ECS (testK:testV) 55 + ECS 5 -> ECS (testK:testV) 45 + ECS 5 -> ECS (testK:testV) 145 + ECS 5 -> ECS (testK:testV) 105 + ECS 5

● ECS (testK:testV) 5 + ECS 5

51
52

● ECS (testK:testV) 55 + ECS 5

53
54

● ECS (testK:testV) 45 + ECS 5

55
56

● ECS (testK:testV) 145 + ECS 5

57
58

● ECS (testK:testV) 105 + ECS 5

59
60

CPU and memory changes

ECS count 5+5 55+5 45+5 145+5 105+5
%CPU 0.0 0.7 0.3 1.0 0.7
%MEM 1.8 2.0 1.8 2.4 2.9

d. Large-scale ECS Scaling

ECS instance count: ECS (testK:testV) 105 + ECS 5 -> ECS (testK:testV) 605 + ECS 5 -> ECS (testK:testV) 994 + ECS 5 -> ECS (testK:testV) 605 + ECS 5 -> ECS (testK:testV) 105 + ECS 5

● ECS (testK:testV) 105 + ECS 5

61
62

● ECS (testK:testV) 605 + ECS 5

63
64

● ECS (testK:testV) 994 + ECS 5

65
66

● ECS (testK:testV) 605 + ECS 5

67
68

● ECS (testK:testV) 105 + ECS 5

69
70

CPU and memory changes

ECS count 105+5 605+5 994+5 605+5 105+5
%CPU 0.7 2.7 5.0 3.3 0.7
%MEM 1.3 4.8 8.5 7.5 6.8

e. Multiple Tags for Filtering

tag_filters:
  - key: 'testK1'
    values: ['testV1', '*']
  - key: 'testK2'
    values: ['testV2', '*']

ECS instance count: ECS (testK1:testV1, testK2:testV2) 5 -> ECS (testK1:testV1, testK2:testV2) 5 + ECS (testK1:testV1) 5 -> ECS (testK1:testV1, testK2:testV2) 55 + ECS (testK1:testV1) 5 -> ECS (testK1:testV1, testK2:testV2) 5 + ECS (testK1:testV1) 5

● ECS (testK1:testV1, testK2:testV2) 5

71
72

● ECS (testK1:testV1, testK2:testV2) 5 + ECS (testK1:testV1) 5

73
74

ECS instances with only the testK1 tag cannot be discovered.

● ECS (testK1:testV1, testK2:testV2) 55 + ECS (testK1:testV1) 5

75
76

● ECS (testK1:testV1, testK2:testV2) 5 + ECS (testK1:testV1) 5

77
78

For the same key, values are OR-related; and between different keys, the relationship is AND.

4. Performance Profiling Analysis

Collect performance data using promtools when the ECS instance count is 100.

a. CPU dump

79

b. Goroutine

80

c. MEM dump

81

0 1 0
Share on

Alibaba Container Service

166 posts | 30 followers

You may also like

Comments

Alibaba Container Service

166 posts | 30 followers

Related Products