By Koordinator Community
Koordinator is a QoS-based scheduling system for hybrid orchestration workloads on Kubernetes. Its goal is to improve the runtime efficiency and reliability of both latency sensitive workloads and batch jobs, simplify the complexity of resource-related configuration tuning, and increase pod deployment density to improve resource utilization.
Since its release in April 2022, Koordinator has released nine versions. During the development of the project for more than half a year, the community absorbed a large number of excellent engineers to promote the maturity of the Koordinator project.
We are pleased to announce the official release of Koordinator v1.1, which includes load-aware scheduling/rescheduling, cgroup v2 support, interference detection metrics collection, and other optimization. Next, we will give an in-depth interpretation and explanation of these new features.
Koordinator v1.0 and earlier versions provide load-aware scheduling and basic utilization threshold to prevent nodes with high load water levels from deteriorating and affecting the runtime quality of workloads and resolve cold node overload through the prediction mechanism. Existing load-aware scheduling can solve problems in many common scenarios. However, as an optimization method, load-aware scheduling still needs to be improved in many scenarios.
The current load-aware scheduling mainly implements the load balancing effect of the whole machine dimension within the cluster, but there may be some special cases: a lot of offline Pods are deployed on the nodes to run, raising the utilization of the whole machine, but the overall utilization rate of the online application workload is low. At this time, if there is a new online Pod and the resources in the entire cluster are insufficient, the following issues may occur:
In Koordinator v1.1, the koord-scheduler supports workload type awareness to differentiate water levels and policies for scheduling.
In the Filter phase:
The prodUsageThresholds parameter of threshold configuration is added to indicate the security threshold of the online application. This parameter is left empty by default. If the Pod to be scheduled is of the Prod type, the koord-scheduler collects the total utilization rate of all online applications from the NodeMetric of the current node. If the total utilization rate exceeds the prodUsageThresholds, the node is filtered out. If the Pod is offline or no prodUsageThresholds is configured, the original logic is used to process the Pod based on the utilization of the whole machine.
In the Score phase:
The scoreAccordingProdUsage switch indicates whether to score according to the Prod Pod utilization. This parameter is not enabled by default. If the switch is enabled and the current Pod is of the Prod type, the koord-scheduler only processes the Pods of the Prod type in the prediction algorithm. The current utilization rate of the Pods of other online applications that are not processed by the prediction algorithm in NodeMetrics is summed up. The summed value is used for the final score. If the scoreAccordingProdUsage is not enabled or the Pod is offline, the original logic is used to process the Pod based on the utilization rate of the whole machine.
Koordinator v1.0 and earlier perform filtering and scoring based on the average utilization data reported by koordlet. However, the average value hides a relatively large amount of information. Therefore, in Koordinator v1.1, koordlet supports data aggregation based on the percentile utilization rate. The scheduler side has also adapted accordingly.
Change the configuration of the LoadAware plugin of the scheduler. Aggregated indicates that data is aggregated based on percentile statistics for scoring and filtering.
aggregated.usageThresholds
indicates the threshold for filtering. aggregated.usageAggregationType
indicates the percentile type of the machine's utilization when filtering, including avg, p50, p90, p95, and p99. aggregated.usageAggregatedDuration
indicates the statistical period of the percentile of the machine's utilization when filtering. When this field is not set, the scheduler uses the data of the maximum period in NodeMetrics by default. aggregated.scoreAggregationType
indicates the percentile type of the machine's utilization when scoring.
aggregated.scoreAggregatedDuration
indicates the statistical period of the percentile of Prod Pod's utilization when scoring. When this field is not set, the scheduler uses the data of the maximum period in NodeMetrics by default.
In the Filter phase:
If the aggregated.usageThresholds
and the corresponding aggregation type are configured, the scheduler will perform filtering based on the percentile statistics.
In the Score phase:
If the aggregated.scoreAggregationType
parameter is configured, the scheduler will score data based on the percentile statistics. Currently, percentile filtering is not supported for Prod pods.
1. Change the koord-scheduler configuration, enable the utilization statistics by Prod, and make it take effect in the filtering and scoring phase. The percentile statistics utilization of the whole machine takes effect in the filtering and scoring phase.
apiVersion: v1
kind: ConfigMap
metadata:
name: koord-scheduler-config
...
data:
koord-scheduler-config: |
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: koord-scheduler
plugins:
# enable the LoadAwareScheduling plugin
filter:
enabled:
- name: LoadAwareScheduling
...
score:
enabled:
- name: LoadAwareScheduling
weight: 1
...
reserve:
enabled:
- name: LoadAwareScheduling
...
pluginConfig:
# configure the thresholds and weights for the plugin
- name: LoadAwareScheduling
args:
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: LoadAwareSchedulingArgs
# whether to filter nodes where koordlet fails to update NodeMetric
filterExpiredNodeMetrics: true
# the expiration threshold seconds when using NodeMetric
nodeMetricExpirationSeconds: 300
# weights of resources
resourceWeights:
cpu: 1
memory: 1
# thresholds (%) of resource utilization
usageThresholds:
cpu: 75
memory: 85
# thresholds (%) of resource utilization of Prod Pods
prodUsageThresholds:
cpu: 55
memory: 65
# enable score according Prod usage
scoreAccordingProdUsage: true
# the factor (%) for estimating resource usage
estimatedScalingFactors:
cpu: 80
memory: 70
# enable resource utilization filtering and scoring based on percentile statistics
aggregated:
usageThresholds:
cpu: 65
memory: 75
usageAggregationType: "p99"
scoreAggregationType: "p99"
2. Deploy a Pod for stress testing
apiVersion: apps/v1
kind: Deployment
metadata:
name: stress-demo
namespace: default
labels:
app: stress-demo
spec:
replicas: 1
selector:
matchLabels:
app: stress-demo
template:
metadata:
name: stress-demo
labels:
app: stress-demo
spec:
containers:
- args:
- '--vm'
- '2'
- '--vm-bytes'
- '1600M'
- '-c'
- '2'
- '--vm-hang'
- '2'
command:
- stress
image: polinux/stress
imagePullPolicy: Always
name: stress
resources:
limits:
cpu: '2'
memory: 4Gi
requests:
cpu: '2'
memory: 4Gi
restartPolicy: Always
schedulerName: koord-scheduler # use the koord-scheduler
$ kubectl create -f stress-demo.yaml
deployment.apps/stress-demo created
Wait for the stress testing Pod to be in the Running state
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
stress-demo-7fdd89cc6b-gcnzn 1/1 Running 0 82s 10.0.3.114 cn-beijing.10.0.3.112 <none> <none>
The Pod is scheduled to the cn-beijing.10.0.3.112 node
3. Check the load of each node
$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
cn-beijing.10.0.3.110 92m 2% 1158Mi 9%
cn-beijing.10.0.3.111 77m 1% 1162Mi 9%
cn-beijing.10.0.3.112 2105m 53% 3594Mi 28%
The output shows that the node cn-beijing.10.0.3.111
has the lowest load, and the node cn-beijing.10.0.3.112
has the highest load.
4. Deploy an online Pod
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-with-loadaware
labels:
app: nginx
spec:
replicas: 6
selector:
matchLabels:
app: nginx
template:
metadata:
name: nginx
labels:
app: nginx
spec:
# Use koord-prod to indicate that the Pod is Prod
priorityClassName: "koord-prod"
schedulerName: koord-scheduler # use the koord-scheduler
containers:
- name: nginx
image: nginx
resources:
limits:
cpu: 500m
requests:
cpu: 500m
$ kubectl create -f nginx-with-loadaware.yaml
deployment/nginx-with-loadawre created
5. Check the scheduling result
$ kubectl get pods | grep nginx
nginx-with-loadaware-5646666d56-224jp 1/1 Running 0 18s 10.0.3.118 cn-beijing.10.0.3.110 <none> <none>
nginx-with-loadaware-5646666d56-7glt9 1/1 Running 0 18s 10.0.3.115 cn-beijing.10.0.3.110 <none> <none>
nginx-with-loadaware-5646666d56-kcdvr 1/1 Running 0 18s 10.0.3.119 cn-beijing.10.0.3.110 <none> <none>
nginx-with-loadaware-5646666d56-qzw4j 1/1 Running 0 18s 10.0.3.113 cn-beijing.10.0.3.111 <none> <none>
nginx-with-loadaware-5646666d56-sbgv9 1/1 Running 0 18s 10.0.3.120 cn-beijing.10.0.3.111 <none> <none>
nginx-with-loadaware-5646666d56-z79dn 1/1 Running 0 18s 10.0.3.116 cn-beijing.10.0.3.111 <none> <none>
The preceding output shows that since the cluster has enabled the load-aware scheduling function, it can sense the node load. After using the scheduling policy, the Pods are preferentially scheduled to nodes other than cn-beijing.10.0.3.112
.
Koordinator has continued to evolve the rescheduler over the past few releases, successively opening the source of the complete framework, enhancing security, and preventing excessive eviction of Pod from affecting the stability of online applications. This also affected the progress of the rescheduling function. Koordinator did not have much power to build the rescheduling capability in the past. This situation will be changed.
Koordinator v1.1 has added the load-aware rescheduling feature. The new plugin is called LowNodeLoad. This plugin cooperates with the load-aware scheduling capability of the scheduler to form a closed loop. The load-aware scheduling of the scheduler makes decisions to select the optimal node at the scheduling moment. However, as the time and cluster environment and the traffic/requests faced by the workload change, load-aware rescheduling can intervene to help optimize the nodes whose load level exceeds the safety threshold. The difference between LowNodeLoad and K8s descheduler plugin LowNodeUtilization is that LowNodeLoad determines rescheduling based on the actual node utilization, while LowNodeUtilization determines rescheduling based on the resource allocation rate.
The LowNodeLoad plugin has two important parameters:
In the following figure, the value of lowThresholds is 45%, and the value of highThresholds is 70%. We can classify nodes into three categories:
After identifying which nodes are hotspot nodes, descheduler performs migration and eviction to evict some Pods on the hotspot nodes to idle nodes.
If the total number of idle nodes in a cluster is not large, the rescheduling is terminated. This may be helpful in large clusters, where some nodes may be underused often or for a short time. By default, numberOfNodes is set to zero. You can set the numberOfNodes parameter to enable this feature.
Before the migration, descheduler calculates the actual idle capacity to ensure that the sum of the actual utilization of the Pods to be migrated does not exceed the total idle capacity in the cluster. This actual idle capacity comes from idle nodes. The actual idle capacity of one idle node = (highThresholds - the current load of the node) the total capacity of the node. Assuming the load level of node A is 20%, the value of highThresholds is 70%, and the total CPU capacity of node A is 96C, then (70% - 20%) 96 = 48C, which is the idle capacity that can be used.
In addition, when migrating hotspot nodes, Pods on the hotspot nodes are filtered. Currently, the descheduler supports a variety of filtering parameters, which can avoid the migration and eviction of very important pods.
After Pods are filtered, they are sorted by QoSClass, Priority, actual usage, and creation time.
After Pods are filtered and sorted, the migration starts. Before the migration, the system checks whether the remaining idle capacity is met and whether the load level of the current node is higher than the target safety threshold. If one of the two conditions is not met, the system stops rescheduling. Each time a Pod is migrated, the remaining idle capacity is withheld, and the load level of the current node is adjusted until the remaining capacity is insufficient or the water level reaches the safety threshold.
1. Change the koord-descheduler configuration and enable LowNodeLoad
apiVersion: v1
kind: ConfigMap
metadata:
name: koord-descheduler-config
...
data:
koord-descheduler-config: |
apiVersion: descheduler/v1alpha2
kind: DeschedulerConfiguration
...
deschedulingInterval: 60s # The execution cycle. The LowNodeLoad plugin is executed once in 60s
profiles:
- name: koord-descheduler
plugins:
deschedule:
disabled:
- name: "*"
balance:
enabled:
- name: LowNodeLoad # Enable the LowNodeLoad plugin
....
pluginConfig:
# Parameters of the LowNodeLoad plugin
- name: LowNodeLoad
args:
apiVersion: descheduler/v1alpha2
kind: LowNodeLoadArgs
evictableNamespaces:
# Include and exclude are mutually exclusive. You can configure only one of them.
# include: # include indicates that only the following configured namespace is processed
# - test-namespace
exclude:
- "kube-system" # The namespace to be excluded
- "koordinator-system"
lowThresholds: # lowThresholds indicates the access watermark threshold of an idle node
cpu: 20 # CPU utilization is 20%
memory: 30 # Memory utilization is 30%
highThresholds: # highThresholds indicates the target security threshold. Nodes that exceed this threshold are determined as hotspot nodes
cpu: 50 # CPU utilization is 50%
memory: 60 # Memory utilization is 60%
....
2. Deploy a Pod for stress testing
apiVersion: apps/v1
kind: Deployment
metadata:
name: stress-demo
namespace: default
labels:
app: stress-demo
spec:
replicas: 1
selector:
matchLabels:
app: stress-demo
template:
metadata:
name: stress-demo
labels:
app: stress-demo
spec:
containers:
- args:
- '--vm'
- '2'
- '--vm-bytes'
- '1600M'
- '-c'
- '2'
- '--vm-hang'
- '2'
command:
- stress
image: polinux/stress
imagePullPolicy: Always
name: stress
resources:
limits:
cpu: '2'
memory: 4Gi
requests:
cpu: '2'
memory: 4Gi
restartPolicy: Always
schedulerName: koord-scheduler # use the koord-scheduler
$ kubectl create -f stress-demo.yaml
deployment.apps/stress-demo created
Wait for the stress testing Pod to be in the Running state
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
stress-demo-7fdd89cc6b-gcnzn 1/1 Running 0 82s 10.0.3.114 cn-beijing.10.0.3.121 <none> <none>
Pod stress-demo-7fdd89cc6b-gcnzn is scheduled to the cn-beijing.10.0.3.121
node.
3. Check the load of each node
$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
cn-beijing.10.0.3.121 2106m 54% 4452Mi 35%
cn-beijing.10.0.3.124 73m 1% 1123Mi 8%
cn-beijing.10.0.3.125 69m 1% 1064Mi 8%
The output shows that the cn-beijing.10.0.3.124
node and the cn-beijing.10.0.3.125
node have the lowest load, and the node cn-beijing.10.0.3.112
has the highest load, which exceeds the configured highThresholds.
4. Observe the Pod changes and wait for the rescheduler to execute the eviction and migration operation
$ kubectl get pod -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
stress-demo-7fdd89cc6b-l7psv 1/1 Running 0 4m45s 10.0.3.127 cn-beijing.10.0.3.121 <none> <none>
stress-demo-7fdd89cc6b-l7psv 1/1 Terminating 0 8m34s 10.0.3.127 cn-beijing.10.0.3.121 <none> <none>
stress-demo-7fdd89cc6b-b4c5g 0/1 Pending 0 0s <none> <none> <none> <none>
stress-demo-7fdd89cc6b-b4c5g 0/1 Pending 0 0s <none> <none> <none> <none>
stress-demo-7fdd89cc6b-b4c5g 0/1 Pending 0 0s <none> cn-beijing.10.0.3.124 <none> <none>
stress-demo-7fdd89cc6b-b4c5g 0/1 ContainerCreating 0 0s <none> cn-beijing.10.0.3.124 <none> <none>
stress-demo-7fdd89cc6b-b4c5g 0/1 ContainerCreating 0 3s <none> cn-beijing.10.0.3.124 <none> <none>
stress-demo-7fdd89cc6b-b4c5g 1/1 Running 0 20s 10.0.3.130 cn-beijing.10.0.3.124 <none> <none>
5. Observe the Event. You can see the following migration records:
$ kubectl get event |grep stress-demo-7fdd89cc6b-l7psv
2m45s Normal Evicting podmigrationjob/20c8c445-7fa0-4cf7-8d96-7f03bb1097d9 Try to evict Pod "default/stress-demo-7fdd89cc6b-l7psv"
2m12s Normal EvictComplete podmigrationjob/20c8c445-7fa0-4cf7-8d96-7f03bb1097d9 Pod "default/stress-demo-7fdd89cc6b-l7psv" has been evicted
11m Normal Scheduled pod/stress-demo-7fdd89cc6b-l7psv Successfully assigned default/stress-demo-7fdd89cc6b-l7psv to cn-beijing.10.0.3.121
11m Normal AllocIPSucceed pod/stress-demo-7fdd89cc6b-l7psv Alloc IP 10.0.3.127/24
11m Normal Pulling pod/stress-demo-7fdd89cc6b-l7psv Pulling image "polinux/stress"
10m Normal Pulled pod/stress-demo-7fdd89cc6b-l7psv Successfully pulled image "polinux/stress" in 12.687629736s
10m Normal Created pod/stress-demo-7fdd89cc6b-l7psv Created container stress
10m Normal Started pod/stress-demo-7fdd89cc6b-l7psv Started container stress
2m14s Normal Killing pod/stress-demo-7fdd89cc6b-l7psv Stopping container stress
11m Normal SuccessfulCreate replicaset/stress-demo-7fdd89cc6b Created pod: stress-demo-7fdd89cc6b-l7psv
Many standalone QoS capabilities and resource throttling/scaling policies in Koordinator are built on the Linux Control Group (cgroups) mechanism, such as CPU QoS (cpu), Memory QoS (memory), CPU Burst (cpu), and CPU Suppress (cpu, cpuset). The koordlet component can use cgroups (v1) to limit the time slice, weight, priority, topology, and other attributes of the available resources of the container. The high-version Linux kernel is also continuously enhancing and updating the cgroups mechanism, bringing the cgroups v2 mechanism to unify the cgroups directory structure, improving cooperation between different subsystems/cgroup controllers in v1, and enhancing the resource management and monitoring capabilities of some subsystems. Kubernetes has adopted cgroups v2 as a general availability (GA) feature since 1.25. This feature is enabled in Kubelet to perform resource management of containers and set resource isolation parameters for containers at a unified cgroups layer to support the enhanced feature of MemoryQoS.
In Koordinator v1.1, the standalone component koordlet adds the support for cgroups v2, including the following work:
Most koordlet features in Koordinator v1.1 are compatible with cgroups v2, including (but not limited to):
Incompatible features (such as PSICollector) will be adapted in the following v1.2, and you can follow issue#407 for the latest progress. More cgroups v2 enhancements will be introduced in the next Koordinator release.
In Koordinator v1.1, the adaptation of koordlet to cgroups v2 is transparent to the upper-layer feature configurations. You do not need to change the ConfigMap slo-controller-config and other feature-gate configurations except for the feature-gate of deprecated features. When koordlet runs on a node with cgroups v2 enabled, the corresponding standalone feature automatically switches to operate on the cgroups-v2 system interface.
In addition, cgroups v2 is a feature of the higher-version Linux kernel (recommended >= 5.8) and depends on the system kernel version and Kubernetes version. We recommend using a Linux distribution with cgroups v2 enabled by default and Kubernetes v1.24 and above.
Please see the documentation for more information about how to enable cgroups v2.
If you expect to develop and customize new features that support cgroups v2 in the koordlet component, you are welcome to learn about the new system resource interface, Resource, and system file operation module, ResourceExecutor, in Koordinator v1.1, which are designed to optimize the consistency and compatibility of system file operations (such as cgroups and resctrl).
You can modify the resource isolation parameters of a container using a common cgroups interface:
var (
// NewCgroupReader() generates a cgroup reader for reading cgroups with the current cgroup version.
// e.g. read `memory.limit_in_bytes` on v1, while read `memory.max` on v2.
cgroupReader = resourceexecutor.NewCgroupReader()
// NewResourceUpdateExecutor() generates a resource update executor for updating system resources (e.g. cgroups, resctrl) cacheablely and in order.
executor = resourceexecutor.NewResourceUpdateExecutor()
)
// readPodCPUSet reads the cpuset CPU IDs of the given pod.
// e.g. read `/sys/fs/cgroup/cpuset/kubepods.slice/kubepods-podxxx.slice/cpuset.cpus` -> `6-15`
func readPodCPUSet(podMeta *statesinformer.PodMeta) (string, error) {
podParentDir := koordletutil.GetPodCgroupDirWithKube(podMeta.CgroupDir)
cpus, err := cgroupReader.ReadCPUSet(podParentDir)
if err != nil {
return "", err
}
return cpus.String(), nil
}
func updatePodCFSQuota(podMeta *statesinformer.PodMeta, cfsQuotaValue int64) error {
podDir := koordletutil.GetPodCgroupDirWithKube(podMeta.CgroupDir)
cfsQuotaStr := strconv.FormatInt(cfsQuotaValue, 10)
// DefaultCgroupUpdaterFactory.New() generates a cgroup updater for cacheable updating cgroups with the current cgroup version.
// e.g. update `cpu.cfs_quota_us` on v1, while update `cpu.max` on v2.
updater, err := resourceexecutor.DefaultCgroupUpdaterFactory.New(system.CPUCFSQuotaName, podParentDir, cfsQuotaStr)
if err != nil {
return err
}
// Use executor to cacheable update the cgroup resource, and avoid the repeated but useless writes.
_, err := executor.Update(true, updater)
if err != nil {
return err
}
return nil
}
You can also add and register cgroups resources and update functions in the following ways:
// package system
const (
// Define the cgroup filename as the resource type of the cgroup resource.
CgroupXName = "xx.xxx"
CgroupYName = "yy.yyy"
CgroupXV2Name = "xx.xxxx"
CgroupYV2Name = "yy.yy"
)
var (
// New a cgroup v1 resource with the filename and the subsystem (e.g. cpu, cpuset, memory, blkio).
// Optional: add a resource validator to validate the written values, and add a check function to check if the system supports this resource.
CgroupX = DefaultFactory.New(CgroupXName, CgroupXSubfsName).WithValidator(cgroupXValidator).WithCheckSupported(cgroupXCheckSupportedFunc)
CgroupY = DefaultFactory.New(CgroupYName, CgroupYSubfsName)
// New a cgroup v2 resource with the corresponding v1 filename and the v2 filename.
// Optional: add a resource validator to validate the written values, and add a check function to check if the system supports this resource.
CgroupXV2 = DefaultFactory.NewV2(CgroupXName, CgroupXV2Name).WithValidator(cgroupXValidator).WithCheckSupported(cgroupXV2CheckSupportedFunc)
CgroupYV2 = DefaultFactory.NewV2(CgroupYName, CgroupYV2Name).WithCheckSupported(cgroupYV2CheckSupportedFunc)
)
func init() {
// Register the cgroup resource with the corresponding cgroup version.
DefaultRegistry.Add(CgroupVersionV1, CgroupX, CgroupY)
DefaultRegistry.Add(CgroupVersionV2, CgroupXV2, CgroupYV2)
}
// package resourceexecutor
func init() {
// Register the cgroup updater with the resource type and the generator function.
DefaultCgroupUpdaterFactory.Register(NewCommonCgroupUpdater,
system.CgroupXName,
system.CgroupYName,
}
In the real production environment, the standalone runtime status is a chaotic system, and the application interference caused by resource competition cannot be avoided. Koordinator is building the capability of interference detection and optimization. It performs real-time analysis and detection by extracting metrics of the application running status and adopts more targeted strategies for target applications and interference sources after interference is detected.
Koordinator has implemented a series of Performance Collectors. It collects underlying metrics that are highly correlated with the running status of applications on a standalone side and exposes them through Prometheus, thus providing support for interference detection and cluster application scheduling.
Performance Collector is controlled by multiple feature-gates. Koordinator currently provides the following metrics collectors:
Performance Collector is currently turned off by default. You can use it by modifying the feature-gates of koordlet, which does not affect other feature-gates:
kubectl edit ds koordlet -n koordinator-system
...
spec:
...
spec:
containers:
- args:
...
# modify here
# - -feature-gates=BECPUEvict=true,BEMemoryEvict=true,CgroupReconcile=true,Accelerators=true
- -feature-gates=BECPUEvict=true,BEMemoryEvict=true,CgroupReconcile=true,Accelerators=true,CPICollector=true,PSICollector=true
In Koordinator v1.1.0, the ServiceMonitor feature is introduced to koordlet to expose collected metrics through Prometheus. You can use this feature to collect metrics for application system analysis and management.
apiVersion: v1
kind: Service
metadata:
labels:
koord-app: koordlet
name: koordlet
namespace: koordinator-system
spec:
clusterIP: None
ports:
- name: koordlet-service
port: 9316
targetPort: 9316
selector:
koord-app: koordlet
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
koord-app: koordlet
name: koordlet
namespace: koordinator-system
spec:
endpoints:
- interval: 30s
port: koordlet-service
scheme: http
jobLabel: koord-app
selector:
matchLabels:
koord-app: koordlet
ServiceMonitor is introduced by Prometheus. Therefore, the installation of ServiceMonitor is disabled by default in the Helm chart. You can run the following command to install ServiceMonitor:
helm install koordinator https://... --set koordlet.enableServiceMonitor=true
After deployment, you can find the Targets in the Prometheus UI:
# HELP koordlet_container_cpi Container cpi collected by koordlet
# TYPE koordlet_container_cpi gauge
koordlet_container_cpi{container_id="containerd://498de02ddd3ad7c901b3c80f96c57db5b3ed9a817dbfab9d16b18be7e7d2d047",container_name="koordlet",cpi_field="cycles",node="your-node-name",pod_name="koordlet-x8g2j",pod_namespace="koordinator-system",pod_uid="3440fb9c-423b-48e9-8850-06a6c50f633d"} 2.228107503e+09
koordlet_container_cpi{container_id="containerd://498de02ddd3ad7c901b3c80f96c57db5b3ed9a817dbfab9d16b18be7e7d2d047",container_name="koordlet",cpi_field="instructions",node="your-node-name",pod_name="koordlet-x8g2j",pod_namespace="koordinator-system",pod_uid="3440fb9c-423b-48e9-8850-06a6c50f633d"} 4.1456092e+09
It is expected that the ability of Koordinator to detect interference needs more detection metrics in more complex real-world scenarios. We will continue to make efforts in the collection and construction of metrics for many other resources (such as memory and disk IO).
You can see the new features on the v1.1 release page.
v1.1 release page: https://github.com/koordinator-sh/koordinator/releases/tag/v1.1.0
The Koordinator community will continue to enrich the forms of big data computing tasks, expand the co-location support for multiple computing frameworks, and enrich co-location task solutions. It will continue to improve the interference detection and problem diagnosis systems, promote the integration of more load types into the Koordinator ecosystem, and achieve better resource operation efficiency.
Click here to learn more product features of Koordinator v1.1.
Make Data Stream Move: An Analysis of RocketMQ Connect Architecture
How Does JVM Obtain the Resource Limit of the Current Container?
508 posts | 48 followers
FollowAlibaba Cloud Native Community - September 18, 2023
Alibaba Cloud Native Community - December 7, 2023
Alibaba Cloud Native Community - December 1, 2022
Alibaba Cloud Native Community - August 15, 2024
OpenAnolis - June 1, 2023
Alibaba Cloud Native Community - January 25, 2024
508 posts | 48 followers
FollowAccelerate and secure the development, deployment, and management of containerized applications cost-effectively.
Learn MoreAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreMore Posts by Alibaba Cloud Native Community