schedule pods based on cache affinity - Container Service for Kubernetes

Fluid allows you to schedule pods based on cache affinity. This way, you can deploy application pods to the nodes on which the cached data is stored, the nodes in the zone where the cached data is located, or the nodes in the region where the cached data is located. This improves data access efficiency.

Limits

This feature is supported only by ACK Pro clusters.
This feature is incompatible with Elastic Container Instance-based scheduling or priority-based resource scheduling.

Prerequisites

An ACK Pro cluster that runs Kubernetes 1.18 or later is created. For more information, see Create an ACK Pro cluster.
The cloud-native AI suite and ack-fluid 1.0.6 or later are deployed in the cluster. For more information, see Deploy the cloud-native AI suite.
Important
If you have already installed open source Fluid, uninstall Fluid and deploy the ack-fluid component.
A kubectl client is connected to the cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

Feature description

Fluid can inject cache affinity rules into pod specifications based on mutating webhooks. When you create a pod, you can configure Fluid to inject different levels of cache affinity rules into the pod specification. This way, kube-scheduler preferentially schedules the pod to the nodes on which the cached data is stored, the nodes in the zone where the cached data is located, or the nodes in the region where the cached data is located.

Important

If the spec.affinity or spec.nodeSelector parameter is already specified in the pod specification, Fluid does not inject cache affinity rules into the pod specification.

Configure the scheduling policy

Default configurations

Fluid supports the following levels of cache affinity scheduling: node, zone, and region. To check the scheduling policy of your cluster, run the following command:

kubectl get cm -n fluid-system webhook-plugins -oyaml

Expected output:

apiVersion: v1
data:
  pluginsProfile: |
    pluginConfig:
    - args: |
        preferred:
          # fluid existed node affinity, the name can not be modified.
          - name: fluid.io/node
            weight: 100
          # runtime worker's zone label name, can be changed according to k8s environment.
          - name: topology.kubernetes.io/zone
            weight: 50
          # runtime worker's region label name, can be changed according to k8s environment.
          - name: topology.kubernetes.io/region
            weight: 20
        # used when app pod with label fluid.io/dataset.{dataset name}.sched set true
        required:
          - fluid.io/node
      name: NodeAffinityWithCache
    plugins:
      serverful:
        withDataset:
        - RequireNodeWithFuse
        - NodeAffinityWithCache
        - MountPropagationInjector
        withoutDataset:
        - PreferNodesWithoutCache
      serverless:
        withDataset:
        - FuseSidecar
        withoutDataset: []

The following table describes the parameters in the pluginsProfile section of the preceding ConfigMap.

Parameter	Description
`fluid.io/node`	A parameter predefined by Fluid. After this parameter is enabled, Fluid automatically injects a node-specific cache affinity rule into the pod specification. The node-specific cache affinity rule specifies the node on which the cached data is stored. The rule weight is 100.
`topology.kubernetes.io/zone`	A Kubernetes cluster parameter that specifies a zone-specific cache affinity rule. After this parameter is enabled, Fluid automatically injects a zone-specific cache affinity rule into the pod specification. The zone-specific cache affinity rule specifies the zone in which the cached data is located. The rule weight is 50.
`topology.kubernetes.io/region`	A Kubernetes cluster parameter that specifies a region-specific cache affinity rule. After this parameter is enabled, Fluid automatically injects a region-specific cache affinity rule into the pod specification. The region-specific cache affinity rule specifies the region in which the cached data is located. The rule weight is 20.

Custom configurations

ACK may use other node labels to identify the topological information of nodes in ACK clusters. To configure Fluid to inject custom affinity rules based on specific node labels into pod specifications, perform the following steps:

Run the following command to modify the webhook-plugins ConfigMap:
```
kubectl edit -n fluid-system cm webhook-plugins
```

Modify the webhook-plugins ConfigMap based on the following sample code.

You can delete existing labels that identify the topological information of the cluster based on your business requirements. For more information, see Example 1: Ignore node affinities.
You can add a custom affinity rule based on a specific node label (such as <topology_key>) and set the rule weight (such as <topology_weight>). For more information, see Example 2: Add the node pool affinity.

apiVersion: v1
data:
  pluginsProfile: |
    pluginConfig:
    - args: |
        preferred:
          # fluid existed node affinity, the name can not be modified.
          - name: fluid.io/node
            weight: 100
          # runtime worker's zone label name, can be changed according to k8s environment.
          - name: topology.kubernetes.io/zone
            weight: 50
          # runtime worker's region label name, can be changed according to k8s environment.
          - name: topology.kubernetes.io/region
            weight: 20
          - name: <topology_key>
            weight: <topology_weight>
        # used when app pod with label fluid.io/dataset.{dataset name}.sched set true
        required:
          - fluid.io/node
      name: NodeAffinityWithCache
    plugins:
      serverful:
        withDataset:
        - RequireNodeWithFuse
        - NodeAffinityWithCache
        - MountPropagationInjector
        withoutDataset:
        - PreferNodesWithoutCache
      serverless:
        withDataset:
        - FuseSidecar
        withoutDataset: []

Example 1: Ignore node-specific cache affinity rules

apiVersion: v1
data:
  pluginsProfile: |
    pluginConfig:
    - args: |
        preferred:
          # fluid existed node affinity, the name can not be modified.
-         #- name: fluid.io/node
-         #  weight: 100
          # runtime worker's zone label name, can be changed according to k8s environment.
          - name: topology.kubernetes.io/zone
            weight: 50
          # runtime worker's region label name, can be changed according to k8s environment.
          - name: topology.kubernetes.io/region
            weight: 20
        # used when app pod with label fluid.io/dataset.{dataset name}.sched set true
        required:
          - fluid.io/node
      name: NodeAffinityWithCache
    plugins:
      serverful:
        withDataset:
        - RequireNodeWithFuse
        - NodeAffinityWithCache
        - MountPropagationInjector
        withoutDataset:
        - PreferNodesWithoutCache
      serverless:
        withDataset:
        - FuseSidecar
        withoutDataset: []

Example 2: Add node pool-specific cache affinity rules

apiVersion: v1
data:
  pluginsProfile: |
    pluginConfig:
    - args: |
        preferred:
          # fluid existed node affinity, the name can not be modified.
          - name: fluid.io/node
            weight: 100
+         - name: alibabacloud.com/nodepool-id
+           weight: 80
          # runtime worker's zone label name, can be changed according to k8s environment.
          - name: topology.kubernetes.io/zone
            weight: 50
          # runtime worker's region label name, can be changed according to k8s environment.
          - name: topology.kubernetes.io/region
            weight: 20
        # used when app pod with label fluid.io/dataset.{dataset name}.sched set true
        required:
          - fluid.io/node
      name: NodeAffinityWithCache
    plugins:
      serverful:
        withDataset:
        - RequireNodeWithFuse
        - NodeAffinityWithCache
        - MountPropagationInjector
        withoutDataset:
        - PreferNodesWithoutCache
      serverless:
        withDataset:
        - FuseSidecar
        withoutDataset: []

Run the following command to restart Fluid Webhook and apply the changes:
```
kubectl rollout restart deployment -n fluid-system fluid-webhook
```

Examples

Example 1: Schedule a pod based on a node-specific cache affinity rule

Create a Secret.

apiVersion: v1
kind: Secret
metadata:
  name: mysecret
stringData:
  fs.oss.accessKeyId: <ACCESS_KEY_ID>
  fs.oss.accessKeySecret: <ACCESS_KEY_SECRET>

Create a Dataset and a Runtime object.

Important

In this example, a JindoRuntime is created. To use other cache runtimes, see Use EFC to accelerate access to NAS or CPFS. For more information about how to use JindoFS to accelerate access to Object Storage Service (OSS), see Use JindoFS to accelerate access to OSS.

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo-dataset
spec:
  mounts:
    - mountPoint: oss://<oss_bucket>/<bucket_dir>
      options:
        fs.oss.endpoint: <oss_endpoint>
      name: hadoop
      path: "/"
      encryptOptions:
        - name: fs.oss.accessKeyId
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeyId
        - name: fs.oss.accessKeySecret
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: demo-dataset
spec:
  replicas: 2
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        quota: 10G
        high: "0.99"
        low: "0.8"

Create an application pod.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    fuse.serverful.fluid.io/inject: "true"
spec:
  containers:
    - name: nginx
      image: registry.openanolis.cn/openanolis/nginx:1.14.1-8.6
      volumeMounts:
        - mountPath: /data
          name: data-vol
  volumes:
    - name: data-vol
      persistentVolumeClaim:
        claimName: demo-dataset

The following parameters are used to enable pod scheduling based on a node-specific cache affinity rule.

Parameter	Description
`fuse.serverful.fluid.io/inject: "true"`	Enables Fluid to inject cache affinity rules into the pod specification.
`claimName`	The persistent volume claim (PVC) that is mounted to the pod. The PVC is automatically created by Fluid and named after the Dataset that you created.

Check the affinity settings in the pod specification.

kubectl get pod nginx -oyaml

Expected output:

apiVersion: v1
kind: Pod
metadata:
  labels:
    fuse.serverful.fluid.io/inject: "true"
  name: nginx
  namespace: default
  ...
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: fluid.io/s-default-demo-dataset
            operator: In
            values:
            - "true"
        weight: 100

A node-specific cache affinity rule (fluid.io/s-default-demo-dataset) is injected into the pod specification. The rule weight depends on the configurations of the node topological parameters in the scheduling policy.

Example 2: Schedule a pod based on a zone-specific cache affinity rule

Create a Secret.

apiVersion: v1
kind: Secret
metadata:
  name: mysecret
stringData:
  fs.oss.accessKeyId: <ACCESS_KEY_ID>
  fs.oss.accessKeySecret: <ACCESS_KEY_SECRET>

Create a Dataset and a Runtime object.

Important

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo-dataset
spec:
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: topology.kubernetes.io/zone
              operator: In
              values:
              - "<ZONE_ID>" # e.g. cn-beijing-i
  mounts:
    - mountPoint: oss://<oss_bucket>/<bucket_dir>
      options:
        fs.oss.endpoint: <oss_endpoint>
      name: hadoop
      path: "/"
      encryptOptions:
        - name: fs.oss.accessKeyId
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeyId
        - name: fs.oss.accessKeySecret
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: demo-dataset
spec:
  replicas: 2
  master:
    nodeSelector:
      topology.kubernetes.io/zone: <ZONE_ID> # e.g. cn-beijing-i
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        quota: 10G
        high: "0.99"
        low: "0.8"

To schedule a pod based on a zone-specific cache affinity rule, you need to implicitly specify the zone in which the cached data is located. In the preceding code block, the topology.kubernetes.io/zone=cn-beijing-i label is specified in the nodeAffinity.required.nodeSelectorTerms parameter.

Create an application pod.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    fuse.serverful.fluid.io/inject: "true"
spec:
  containers:
    - name: nginx
      image: registry.openanolis.cn/openanolis/nginx:1.14.1-8.6
      volumeMounts:
        - mountPath: /data
          name: data-vol
  volumes:
    - name: data-vol
      persistentVolumeClaim:
        claimName: demo-dataset

The following parameters are used to enable pod scheduling based on a zone-specific cache affinity rule.

Parameter	Description
`fuse.serverful.fluid.io/inject: "true"`	Enables Fluid to inject cache affinity rules into the pod specification.
`claimName`	The PVC that is mounted to the pod. The PVC is automatically created by Fluid and named after the Dataset that you created.

Check the affinity settings in the pod specification.

kubectl get pod nginx -oyaml

Expected output:

apiVersion: v1
kind: Pod
metadata:
  labels:
    fuse.serverful.fluid.io/inject: "true"
  name: nginx
  namespace: default
  ...
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: fluid.io/s-default-demo-dataset
            operator: In
            values:
            - "true"
        weight: 100
      - preference:
          matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values:
            - <ZONE_ID> # e.g. cn-beijing-i
        weight: 50
...

A node-specific cache affinity rule (fluid.io/s-default-demo-dataset) and a zone-specific cache affinity rule (topology.kubernetes.io/zone) are injected into the pod specification. The rule weights depend on the configurations of the node topological parameters in the scheduling policy.

Example 3: Force pod scheduling based on a node-specific cache affinity rule

Create a Secret.

apiVersion: v1
kind: Secret
metadata:
  name: mysecret
stringData:
  fs.oss.accessKeyId: <ACCESS_KEY_ID>
  fs.oss.accessKeySecret: <ACCESS_KEY_SECRET>

Create a Dataset and a Runtime object.

Important

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo-dataset
spec:
  mounts:
    - mountPoint: oss://<oss_bucket>/<bucket_dir>
      options:
        fs.oss.endpoint: <oss_endpoint>
      name: hadoop
      path: "/"
      encryptOptions:
        - name: fs.oss.accessKeyId
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeyId
        - name: fs.oss.accessKeySecret
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: demo-dataset
spec:
  replicas: 2
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        quota: 10G
        high: "0.99"
        low: "0.8"

Create an application pod.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    fuse.serverful.fluid.io/inject: "true"
    fluid.io/dataset.demo-dataset.sched: required
spec:
  containers:
    - name: nginx
      image: registry.openanolis.cn/openanolis/nginx:1.14.1-8.6
      volumeMounts:
        - mountPath: /data
          name: data-vol
  volumes:
    - name: data-vol
      persistentVolumeClaim:
        claimName: demo-dataset

The following parameters are used to force pod scheduling based on a node-specific cache affinity rule.

Parameter	Description
`fuse.serverful.fluid.io/inject: "true"`	Enables Fluid to inject cache affinity rules into the pod specification.
`fluid.io/dataset.<dataset_name>.sched: required`	Specifies the `<dataset_name>` Dataset that is related to the forced node-specific affinity rule to be injected.
`claimName`	The PVC that is mounted to the pod. The PVC is automatically created by Fluid and named after the Dataset that you created.

Check the affinity settings in the pod specification.

kubectl get pod nginx -oyaml

Expected output:

apiVersion: v1
kind: Pod
metadata:
  labels:
    fluid.io/dataset.demo-dataset.sched: required
    fuse.serverful.fluid.io/inject: "true"
  name: nginx
  namespace: default
  ...
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: fluid.io/s-default-demo-dataset
            operator: In
            values:
            - "true"

A forced node-specific cache affinity rule (fluid.io/s-default-demo-dataset) is injected into the pod specification.