Provide resource configuration suggestions for containers based on resource profiling - Container Service for Kubernetes

Container Service for Kubernetes (ACK) can profile resources for Kubernetes-native workloads and provide resource configuration suggestions for containers based on the historical data of resource usage. This greatly simplifies the configuration of resource requests and limits for containers. This topic describes how to use resource profiling in the ACK console and how to use resource profiling with the CLI.

Prerequisites and usage notes

Only ACK Pro clusters that meet the following requirements support resource profiling:
- ack-koordinator (FKA ack-slo-manager) version 0.7.1 or later is installed. For more information, see ack-koordinator (FKA ack-slo-manager).
- metrics-server version 0.3.8 or later is installed.
- If your cluster uses containerd as the container runtime and the cluster nodes were added before 14:00 (UTC+8) on January 19, 2022, you must remove the cluster nodes and re-add them to the cluster, or update the Kubernetes version of your cluster to the latest version. For more information, see Add existing ECS instances to an ACK cluster and Manually update ACK clusters.
Resource profiling is available for public preview in the Cost Suite module. You can directly access and use resource profiling.
To ensure the accuracy of resource profiling, we recommend that you wait at least one day after you enable resource profiling for the system to collect data.

Billing rules

No fee is charged when you install and use the ack-koordinator component. However, fees may be charged in the following scenarios:

ack-koordinator is an non-managed component that occupies worker node resources after it is installed. You can specify the amount of resources requested by each module when you install the component.
By default, ack-koordinator exposes the monitoring metrics of features such as resource profiling and fine-grained scheduling as Prometheus metrics. If you enable Prometheus metrics for ack-koordinator and use Managed Service for Prometheus, these metrics are considered as custom metrics and fees are charged for these metrics. The fee depends on factors such as the size of your cluster and the number of applications. Before you enable Prometheus metrics, we recommend that you read the Billing overview topic of Managed Service for Prometheus to learn the free quota and billing rules of custom metrics. For more information about how to monitor and manage resource usage see Resource usage and bills.

Introduction to resource profiling

Kubernetes allows you to describe the resource requests of containers to manage container resources. After you specify the resource request for a container, the scheduler matches the resource request with the allocatable resources of each node to determine the node to which the container is scheduled. You can refer to the historical resource utilization and stress test results of a container when you manually specify the resource request. You can also adjust the resource request after the container is created based on the performance of the container.

However, you may encounter the following issues:

To ensure application stability, you need to reserve a specific amount of resources as a buffer to handle the fluctuations of the upstream and downstream workloads. As a result, the amount of resources in the resource requests that you specify for containers may greatly exceed the actual amount of resources used by the containers. This causes low resource utilization and resource waste in the cluster.
If your cluster hosts a large number of pods, you can decrease the resource request for individual containers to increase resource utilization in the cluster. This allows you to deploy more containers on a node. However, application stability is adversely affected when traffic spikes.

To resolve this issue, ack-koordinator provides resource profiles for workloads. You can obtain resource configuration suggestions for individual containers based on resource profiles. This simplifies the work of configuring resource requests and limits for containers. The ACK console has integrated the resource profiling feature to allow you to quickly check whether the resource requests of your pods are configured properly and adjust the resource configuration on demand. You can also use the CLI to create CustomResourceDefinitions (CRDs) to manage resource profiles.

Use resource profiling in the ACK console

Step 1: Install the resource profiling component

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Cost Suite > Cost Optimization.
On the Cost Optimization page, click the Resource Profiling tab, and follow the instructions in the Resource Profiling section to enable this feature.
- Follow the instructions on the page to install or update the ack-koordinator component. If this is the first time you use resource profiling, you need to install the ack-koordinator component.
  Note
  If an ack-koordinator version earlier than 0.7.0 is used, you need to perform a migration and update. For more information, see Migrate ack-koordinator from the Marketplace page to the Add-ons page.
- If this is the first time you use resource profiling, after the component is installed or updated, we recommend that you select Default Settings to enable resource profiling for all workloads. You can click Profiling Configuration to modify the applicable scope of resource profiling later.
Click Enable Resource Profiling to go to the Resource Profiling tab.

Step 2: Manage resource profiling policies

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Cost Suite > Cost Optimization.

On the Cost Optimization page, click the Resource Profiling tab, and then click Profiling Configuration.

You can choose Global Configuration or Custom Configuration. The default settings that you selected when you install the resource profiling component belong to the global configuration. You can choose Global Configuration, modify the settings, and then click OK to apply the modifications.

Global configuration mode (recommended)

In global configuration mode, resource profiling is enabled for workloads other than those in the arms-prom and kube-system namespaces by default.

Parameter	Description	Valid value
Excluded Namespace	The namespaces for which you want to disable resource profiling. In most cases, resource profiling is disabled for the namespaces of system components. After you modify the global configuration, resource profiling is enabled only for workloads of the specified types that do not belong to the excluded namespaces.	You can specify one or more existing namespaces in the cluster. By default, the kube-system and arms-prom namespaces are specified.
Workload Type	The types of workloads for which resource profiling is enabled. After you modify the global configuration, resource profiling is enabled only for workloads of the specified types that do not belong to the excluded namespaces.	The following Kubernetes workload types are supported: Deployment, StatefulSet, and DaemonSet. You can select one or more workload types.
CPU Redundancy Rate/Memory Redundancy Rate	The redundancy rate that is specified in the resource profiling policy. For more information, see the following section.	The redundancy rate must be 0 or a positive value. The system also provides three commonly used redundancy rates: 70%, 50%, and 30%.

Custom configuration mode

In custom configuration mode, resource profiling is enabled only for partial workloads. If your cluster is large (with more than 1,000 nodes) or you want to enable resource profiling for partial workloads, choose the custom configuration mode.

Parameter	Description	Valid value
Namespace	The namespaces for which you want to enable resource profiling. After you modify the custom configuration, resource profiling is enabled for workloads of the specified types that belong to the selected namespaces.	You can select one or more existing namespaces in the cluster.
Workload Type	The workload types for which you want to enable resource profiling. After you modify the custom configuration, resource profiling is enabled for workloads of the specified types that belong to the selected namespaces.	The following Kubernetes workload types are supported: Deployment, StatefulSet, and DaemonSet. You can select one or more workload types.
CPU Redundancy Rate/Memory Redundancy Rate	The redundancy rate that is specified in the resource profiling policy. For more information, see the following section.	The redundancy rate must be 0 or a positive value. The system also provides three commonly used redundancy rates: 70%, 50%, and 30%.

Resource redundancy: When an administrator assesses the workloads of an application, such as the QPS of the application, the administrator usually assumes that the workloads will not occupy 100% physical resources. This is because even technologies such as hyper-threading have limits on physical resources and the application also needs to reserve resources to handle traffic spikes during peak hours. If the difference between the suggested resource request and the original resource request exceeds the specified redundancy rate, the system suggests that you decrease the resource request. For more information about the resource profiling algorithm, see the Overview of application profiles section. 资源冗余

Step 3: View resource profiles

After you configure the resource profiling policy, you can view the resource profiles of the workloads on the Resource Profiling page.

To ensure the accuracy of resource profiles, if this is the first time you use resource profiling, you need to wait at least 24 hours for the system to collect data.

This page displays the aggregated resource profile data and the resource profile of each workload.

Note

In the following table, a hyphen (-) indicates N/A.

Column	Description	Valid value	Filter
Workload Name	The name of the workload.	-	Supported. You can filter resource profiles by workload name.
Namespaces	The namespace to which the workload belongs.	-	Supported. You can filter resource profiles by namespace. By default, the kube-system namespace is excluded from filter conditions.
Workload Type	The type of workload.	Valid values: Deployment, DaemonSet, and StatefulSet.	Supported. You can filter resource profiles by workload type. By default, all workload types are selected as filter conditions.
CPU Request	The number of CPU cores that are requested by the pod of the workload.	-	Not supported.
Memory Request	The memory size that is requested by the pod of the workload.	-	Not supported.
Profile Data Status	The status of the resource profile.	Collecting: The resource profiling component is collecting historical data and generating the profiling result. To view the resource profile of a workload, we recommend that you wait at least one day after you enable resource profiling and make sure that the workload experiences traffic fluctuations within the day. Normal: The resource profile is generated. Workload Deleted: The workload is deleted. The resource profile of the workload will be deleted after a period of time.	Not supported.
CPU Profile/Memory Profile	The CPU profile and memory profile provide suggestions on how to modify the original CPU request and memory request. The values are generated based on the suggested resource request, the original resource request, and the resource redundancy rate. For more information, see the following section.	Valid values: Increase, Decrease, and Maintain. The percentage value that indicates the degree of difference between the original resource request and the suggested resource request. The degree of difference is calculated based on the following formula: Abs(Suggested resource request - Original resource request)/Original resource request.	Supported. By default, Increase and Decrease are selected as filter conditions.
Creation Time	The time when the resource profile was created.	-	Not supported.
Change Resource Configuration	After you check the resource profiles and suggestions, you can click Change Resource Configuration to modify the resource configurations. For more information, see Step 5: Modify resource configurations.	-	Not supported.

The resource profiling feature of ACK provides a suggested resource request for each container of the workload. The feature also provides suggestions on whether to increase or decrease the resource request of the workload based on the suggested resource request, original resource request, and resource redundancy rate. If the workload has multiple containers, ACK provides suggestions for the container whose original resource request has the highest degree of difference compared with the suggested resource request. The following content describes how ACK calculates the degree of difference between the suggested resource request and the original resource request.

If the suggested resource request is greater than the original resource request, the resource usage of the container is higher than the resource request of the container. In this case, ACK suggests that you increase the resource request of the container.

If the suggested resource request is lower than the original resource request, the resource usage of the container is lower than the resource request of the container. In this case, ACK suggests that you decrease the resource request of the container to avoid resource waste. ACK calculates the degree of difference between the suggested resource request and the original resource request in the following way:
1. ACK calculates the target resource request based on the following formula: Target resource request = Suggested resource request × (1 + Resource redundancy rate).
2. ACK calculates the degree of difference between the target resource request and the original resource request based on the following formula: Degree = 1 - Original resource request/Target resource request.
3. ACK generates suggestions on adjusting CPU and memory requests based on the degree of difference between the target resource request and the original resource request. If the degree value exceeds 0.1, ACK suggests that you decrease the resource request.
In other cases, Maintain is displayed in the CPU Profile or Memory Profile column, which means that you do not need to adjust the resource request.

Step 4: View the details of a resource profile

On the Resource Profiling tab, click the name of a workload to go to the profile details page.

The details page displays the basic information of the workload, the resource curves of each container, and the resource configuration that you can modify. 应用画像详情

The preceding figure shows the CPU curves of a workload.

Curve	Description
cpu limit	The CPU limit curve of the container.
cpu request	The CPU request curve of the container.
cpu recommend	The suggested CPU request curve of the container.
cpu usage (average)	The curve of the average CPU usage of the container.
cpu usage (max)	The curve of the peak CPU usage of the container.

Step 5: Modify resource configurations

In the Change Resource Configuration section at the bottom of the profile details page, you can modify the resource configuration based on the suggested values generated by resource profiling.

The following table describes the columns. 资源变更

Parameter	Description
Resource Request	The original resource request of the container.
Resource Limit	The original resource limit of the container.
Profile Value	The resource request that is suggested by ACK.
Resource Redundancy Rate	The resource redundancy rate that is specified in the resource profiling policy. You can specify the new resource request based on the redundancy rate and the suggested resource request. In the preceding figure, the new CPU request is calculated based on the following formula: 4.28 CPU cores × (1 + 30%) = 5.6 CPU cores.
New Resource Request	The new resource request that you want to use.
New Resource Limit	The new resource limit that you want to use. If topology-aware CPU scheduling is enabled for the workload, the CPU limit must be an integer.

After you set the parameters, click Submit. The system starts to update the resource configuration of the workload. You are redirected to the details page of the workload.
After the resource configuration is updated, the controller performs a rolling update for the workload and recreates the pods.

Use resource profiling with the CLI

Step 1: Enable resource profiling

Use the following YAML template to create a file named recommendation-profile.yaml and enable resource profiling for your workloads.

You can use the RecommendationProfile CRD to generate resource profiles for your workloads and obtain resource configuration suggestions. You can specify the namespaces and workload types to which a RecommendationProfile CRD is applied.

apiVersion: autoscaling.alibabacloud.com/v1alpha1
kind: RecommendationProfile
metadata:
  # The name of the RecommendationProfile CRD. If you want to create a non-namespaced RecommendationProfile CRD, do not specify a namespace. 
  name: profile-demo
spec:
  # The workload types for which you want to enable resource profiling. 
  controllerKind:
  - Deployment
  # The namespaces for which you want to enable resource profiling. 
  enabledNamespaces:
  - default

The following table describes the parameters in the YAML template.

Parameter	Type	Description
`metadata.name`	String	The name of the resource object. If you want to create a non-namespaced RecommendationProfile CRD, do not specify a namespace.
`spec.controllerKind`	String	The workload types for which you want to enable resource profiling. Valid values: Deployment, StatefulSet, and DaemonSet.
`spec.enabledNamespaces`	String	The namespaces for which you want to enable resource profiling.

Run the following command to enable resource profiling for the application that you created:
```
kubectl apply -f recommender-profile.yaml
```

Create a file named cpu-load-gen.yaml and copy the following content to the file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cpu-load-gen
  labels:
    app: cpu-load-gen
spec:
  replicas: 2
  selector:
    matchLabels:
      app: cpu-load-gen-selector
  template:
    metadata:
      labels:
        app: cpu-load-gen-selector
    spec:
      containers:
      - name: cpu-load-gen
        image: registry.cn-zhangjiakou.aliyuncs.com/acs/slo-test-cpu-load-gen:v0.1
        command: ["cpu_load_gen.sh"]
        imagePullPolicy: Always
        resources:
          requests:
            cpu: 8 # Request eight CPU cores for the application. 
            memory: "1Gi"
          limits:
            cpu: 12
            memory: "2Gi"

Run the following command to apply cpu-load-gen.yaml and deploy the cpu-load-gen application:
```
kubectl apply -f cpu-load-gen.yaml
```

Run the following command to obtain resource configuration suggestions for the application that you created:

kubectl get recommendations -l \
  "alpha.alibabacloud.com/recommendation-workload-apiVersion=apps-v1, \
  alpha.alibabacloud.com/recommendation-workload-kind=Deployment, \
  alpha.alibabacloud.com/recommendation-workload-name=cpu-load-gen" -o yaml

Note

To generate accurate resource configuration suggestions, we recommend that you wait at least one day after you enable resource profiling for the system to collect data.

After you enable resource profiling for your workloads, ack-koordinator provides resource configuration suggestions for your workloads. The suggestions are stored in the Recommendation CRD. The following code block shows a resource profile named cpu-load-gen.yaml.

apiVersion: autoscaling.alibabacloud.com/v1alpha1
kind: Recommendation
metadata:
  labels:
    alpha.alibabacloud.com/recommendation-workload-apiVersion: app-v1
    alpha.alibabacloud.com/recommendation-workload-kind: Deployment
    alpha.alibabacloud.com/recommendation-workload-name: cpu-load-gen
  name: f20ac0b3-dc7f-4f47-b3d9-bd91f906****
  namespace: recommender-demo
spec:
  workloadRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cpu-load-gen
status:
  recommendResources:
    containerRecommendations:
    - containerName: cpu-load-gen
      target:
        cpu: 4742m
        memory: 262144k
      originalTarget: #The intermediate result generated by the resource profiling algorithm. We recommend that you do not use the intermediate result. 
       # ...

To facilitate data retrieval, the Recommendation CRD is generated in the same namespace as the workload. In addition, the Recommendation CRD saves the API version, type, and name of the workload in the labels described in the following table.

Label Key	Description	Example
alpha.alibabacloud.com/recommendation-workload-apiVersion	The API version of the workload. The value of the label conforms to the Kubernetes specifications. Forward slashes (/) are replaced by hyphens (-).	app-v1 (Original form: app/v1)
alpha.alibabacloud.com/recommendation-workload-kind	The type of the workload, for example, Deployment or StatefulSet.	Deployment
alpha.alibabacloud.com/recommendation-workload-name	The name of the workload. The value of the label conforms to the Kubernetes specifications and cannot exceed 63 characters in length.	cpu-load-gen

The resource profiling result of each container is saved in status.recommendResources.containerRecommendations. The following table describes the parameters.

Parameter	Description	Format	Example
`containerName`	The name of the container.	string	cpu-load-gen
`target`	The resource profiling result, including the suggested CPU request and memory request.	map[ResourceName]resource.Quantity	cpu: 4742mmemory: 262144k
`originalTarget`	The intermediate result generated by the resource profiling algorithm. We recommend that you do not use the intermediate result. If you have any requirements, submit a ticket.	-	-

Note

The suggested minimum amount of CPU resources is 0.025 CPU cores. The suggested minimum amount of memory resources is 250 MB.

Compare the resource configurations requested by the cpu-load-gen.yaml application and the suggested resource configurations in this step. The requested CPU resources are greater than the suggested CPU resources. You can reduce the CPU request of the application to save resources.

Resource	Requested amount	Suggested amount
CPU	8 vCPUs	4.742 vCPUs

Step 2. (Optional): Verify the profiling results in Managed Service for Prometheus

ack-koordinator allows you to verify resource profiles in Managed Service for Prometheus from the ACK console.

If this is the first time you use the dashboards provided by Managed Service for Prometheus, make sure that the Resource Profile dashboard is updated to the latest version. For more information about how to update the dashboard, see Related operations.
To view details about the collected resource profiles on the Prometheus Monitoring page in the ACK console, perform the following steps:
1. Log on to the ACK console. In the left-side navigation pane, click Clusters.
2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Operations > Prometheus Monitoring.
3. On the Prometheus Monitoring page, choose Cost Analysis/Resource Optimization > Resource Profile.
  On the Resource Profile tab, you can view details about the collected resource profiles. The details include the resource requests, resource usage, and suggested resource configuration for containers. For more information, see Managed Service for Prometheus.
If you use a self-managed Prometheus monitoring system, you can use the following metrics to configure dashboards:
```
# Specify resource as CPU for profiling. 
koord_manager_recommender_recommendation_workload_target{exported_namespace="$namespace", workload_name="$workload", container_name="$container", resource="cpu"}
# Specify resource as memory for profiling. 
koord_manager_recommender_recommendation_workload_target{exported_namespace="$namespace", workload_name="$workload", container_name="$container", resource="memory"}
```
Important
The monitoring metric of resource profiling provided by ack-koordinator was renamed to koord_manager_recommender_recommendation_workload_target in v1.5.0-ack1.14. However, the metric name slo_manager_recommender_recommendation_workload_target in earlier versions remains compatible. If you use a self-managed Prometheus monitoring system, we recommend that you switch the monitoring metric name to koord_manager_recommender_recommendation_workload_target after upgrading ack-koordinator to v1.5.0-ack1.14.

FAQ

How does the resource profiling algorithm work?

The resource profiling algorithm uses a multi-dimensional data model and has the following characteristics:

The resource profiling algorithm continuously collects resource metrics of containers and generates suggestions based on the aggregate values of CPU metrics and memory metrics.
When the resource profiling algorithm calculates aggregate values, the most recently collected metrics have the highest weights.
The resource profiling algorithm takes container events into consideration, such as out of memory (OOM) errors. This increases the accuracy of the suggestions.

What types of applications are suitable for resource profiling?

Resource profiling is suitable for online applications.

In most cases, the resource configurations suggested by the resource profiling feature can meet the resource requirements of your applications. Offline applications use batch processing and require high throughput. Offline applications allow resource contention so as to improve resource utilization. If you enable resource profiling for offline applications, resource waste may occur. In most cases, key system components are deployed in active/standby mode and have multiple replicas. The resources that are allocated to standby replicas are idle. As a result, the resource profiling algorithm generates inaccurate results. In the preceding cases, we recommend that you do not use the resource configurations suggested by resource profiling. ACK will provide updates on how to specify resource configurations based on the suggestions provided by resource profiling in these cases.

Can I directly use the resource configurations suggested by resource profiling when I specify the resource request and resource limit of a container?

Resource profiling generates resource configuration suggestions based on the current resource demand of an application. Administrators need to take business characteristics into consideration and modify the suggested values accordingly. For example, you may need to reserve resources to handle traffic spikes or reserve resources for zone-disaster recovery. You may also need to increase the suggested values to ensure that resource-thirsty applications can run stably when the loads of the host are high.

How do I view resource profiles if I use a self-managed Prometheus monitoring system?

The Koordinator Manager component of ack-koordinator exposes the monitoring metrics of resource profiles as Prometheus metrics using HTTP API operations. Run the following command to query the IP addresses of the pods and view metrics data:

# Run the following command to query the IP addresses of the pods
$ kubectl get pod -A -o wide | grep koord-manager
# Expected output, may vary based on your environment
kube-system   ack-koord-manager-5479f85d5f-7xd5k                         1/1     Running            0                  19d   192.168.12.242   cn-beijing.192.168.xx.xxx   <none>           <none>
kube-system   ack-koord-manager-5479f85d5f-ftblj                         1/1     Running            0                  19d   192.168.12.244   cn-beijing.192.168.xx.xxx   <none>           <none>

# Run the following command to view metrics data (Koordinator Manager is in the master-replica architecture, metrics data is provided in the primary replica pod)
# See the Deployment configuration of Koordinator Manager for the IP address and port
# Make sure that the host that runs commands and the container are interconnected before accessing
$ curl -s http://192.168.12.244:9326/metrics | grep slo_manager_recommender_recommendation_workload_target
# Expected output, may vary based on your environment
# HELP slo_manager_recommender_recommendation_workload_target Recommendation of workload resource request.
# TYPE slo_manager_recommender_recommendation_workload_target gauge
slo_manager_recommender_recommendation_workload_target{container_name="xxx",namespace="xxx",recommendation_name="xxx",resource="cpu",workload_api_version="apps/v1",workload_kind="Deployment",workload_name="xxx"} 2.406
slo_manager_recommender_recommendation_workload_target{container_name="xxx",namespace="xxx",recommendation_name="xxx",resource="memory",workload_api_version="apps/v1",workload_kind="Deployment",workload_name="xxx"} 3.861631195e+09

Service and Service Monitor are automatically created and associated to the pod after ack-koordinator is installed. Relevent metrics are collected and displayed in the Grafana dashboard if you use Alibaba Cloud Prometheus Service.

For more information about how to configure custom metric collection configurations for a self-managed Prometheus monitoring system, see the official Prometheus documentation. You can configure Grafana dashboards to view resource profiles. For more information, see Step 2. (Optional): Verify the profiling results in Managed Service for Prometheus.

How do I delete resource profiles and resource profiling policies?

Resource profiles are stored in the Recommendation CRD. Resource profiling policies are stored in the RecommendationProfile CRD. You can run the following command to delete all resource profiles and resource profiling policies:

# Delete all resource profiles. 
kubectl delete recommendation -A --all

# Delete all resource profiling policies. 
kubectl delete recommendationprofile -A --all

How do I authorize a RAM user to use the resource profiling feature?

The authorization system of ACK consists of Resource Access Management (RAM) authorization and role-based access control (RBAC) authorization. RAM authorization is used to grant permissions on cloud resources. RBAC authorization is used to grant permissions on Kubernetes resources within a cluster. For more information about authorization system of ACK, see Best practices of authorization. To authorize a RAM user to use the resource profiling feature, we recommend that you perform the following steps from best practices for authorization:

RAM Authorization
Log on to the RAM console with an Alibaba Cloud account and grant the predefined AliyunCSFullAccess permission to the RAM user. For more information, see Authorization overview.
RBAC Authorization
After you perform RAM authorization, you must perform RBAC authorization of the Developer role or higher on this RAM user in the target cluster. For more information, see Grant RBAC permissions to RAM users or RAM roles.

Note

The predefined RBAC roles at the Developer level and higher have read and write permissions for all resources in a Kubernetes cluster. If you require fine-grained access control, you can create or edit a custom ClusterRole. For more information, see Customize an RBAC role.

To use the resource profiling feature, add the following content to the ClusterRole:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: recommendation-clusterrole
- apiGroups:
  - autoscaling.alibabacloud.com
  resources:
  - '*'
  verbs:
  - '*'