Configure storage settings for cross-zone deployment to ensure high availability - Container Service for Kubernetes

You can optimize storage settings for cross-zone deployment to greatly reduce application release interruptions and ensure that key business systems and applications can run as expected even if failures occur. This topic describes the recommended storage settings for cross-zone deployment.

Background information

Kubernetes provides powerful container orchestration capabilities to help you develop large-scale stateful applications on Kubernetes with ease. Kubernetes greatly simplifies the distribution and deployment of applications, and hides the underlying hardware logic from users. However, this may cause the following issues.

Your application in a cross-zone cluster is accidentally deployed in Zone B instead of Zone A, which is the desired zone.
When you create a persistent volume (PV) and persistent volume claim (PVC) to mount a disk, the InvalidDataDiskCatagory.NotSupported error message is reported. For more information, see the Why does the system prompt InvalidDataDiskCatagory.NotSupported when I create a dynamically provisioned PV? section of the "FAQ about disk volumes" topic.
When you mount a disk to an application, the The instanceType of the specified instance does not support this disk category error message is reported.
When you debug an application, the 0/x node are available, x nodes had volume node affinity conflict error message is reported.

The preceding issues can interrupt application releases. To reduce these issues, you can use the recommended storage settings for cross-zone deployment provided in this topic.

Recommended storage settings

Use disks instead of File Storage NAS (NAS) file systems to persistently store data. Disks are more stable than NAS file systems and provide higher bandwidth for data transfer.
Deploy your cluster across three zones to ensure sufficient node and storage resources.
Add nodes when no nodes can be used in the zones of the cluster.
Use multiple types of disks to prevent mount failures.
Make sure that your application can be evenly distributed to the nodes in different zones.

Recommended node pool settings

Make sure that each node pool is associated with only one zone, and create a node pool for each newly added zone. For more information, see Create a node pool.
Important
Make sure that each node pool is associated with a different zone. We recommend that you specify zone IDs in node pool names.
Enable the auto scaling feature for the node pools. For more information, see Enable node auto scaling.
Use the same type of Elastic Compute Service (ECS) instance across zones, or use ECS instances that support the same type of block storage.
Add taints to all nodes in a node pool to ensure that other application pods are not scheduled to the nodes and adversely affect the current application.

Configuration description:

Associate each node pool with only one zone and enable the auto scaling feature for the node pool.
The system can automatically select a node in another zone for pod scheduling if no nodes can be used in the current zone. The following figure shows the configuration.
Use the same type of ECS instance across zones.
ECS and block storage are correlated. Pod scheduling within a zone may fail if some nodes do not support the claimed block storage resource. As a result, the pod cannot be started due to the disk mount failure.

Recommended cluster settings

Make sure that the cluster version is 1.20 or later.
Make sure that the version of the Container Storage Interface (CSI) plug-in is 1.22 or later. For more information, see csi-provisioner.

To specify multiple types of disks for high availability, use the following StorageClass template:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: alicloud-disk-topology-alltype
parameters:
  type: cloud_essd,cloud_ssd,cloud_efficiency
provisioner: diskplugin.csi.alibabacloud.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
allowedTopologies:
- matchLabelExpressions:
  - key: topology.diskplugin.csi.alibabacloud.com/zone
    values:
    - cn-beijing-a
    - cn-beijing-b

Parameter description:

type: cloud_essd,cloud_ssd,cloud_efficiency:
This parameter ensures that the CSI plug-in preferably creates Enterprise SSDs (ESSDs). If ESSDs in the zone are insufficient, the CSI plug-in creates SSDs. This helps prevent application startup failures due to insufficient disk resources.
volumeBindingMode: WaitForFirstConsumer:
This parameter specifies a disk creation mode provided by Kubernetes. In this mode, the CSI plug-in creates a disk based on the StorageClass only after the pod is scheduled to a specific node. This way, disks can be created based on the information of the node to which the pod is scheduled.
allowedTopologies:
This parameter limits the zones of the topology of the provisioned volumes. If the StorageClass uses the WaitForFirstConsumer mode, the scheduler schedules pods to the specified topology because disks can be created only in the specified topology by using the StorageClass.

Recommended application settings

The following sample code provides an example of a standard StatefulSet template. You can customize the template based on your business requirements.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: "mysql"
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app: mysql
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
      containers:
      - image: mysql:5.6
        name: mysql
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: "mysql"
        volumeMounts:
        - name: disk-csi
          mountPath: /var/lib/mysql
      tolerations:
      - key: "app"
        operator: "Exists"
        effect: "NoSchedule"
  volumeClaimTemplates:
  - metadata:
      name: disk-csi
    spec:
      accessModes: [ "ReadWriteMany" ]
      storageClassName: alicloud-disk-topology-alltype
      resources:
        requests:
          storage: 40Gi

Parameter description:

topologySpreadConstraints:
This parameter ensures that the pods created by the application are scheduled to different zones to implement high availability. For more information, see Pod Topology Spread Constraints.
volumeClaimTemplates:
You can use this parameter to automatically create a disk for each replicated pod. This helps quickly scale out storage resources.

Important

When a PV is dynamically provisioned, the YAML file of the PV contains information about the zones of the nodes to which the PV is mounted. The PV and the associated PVC can be scheduled only within the zones of the nodes. This ensures that the PV can be mounted to pods.

References

For more information about how to enhance the data security of disk volumes, see Best practices for data security of disk volumes.
For more information about how to monitor disks in real time, see Use csi-plugin to monitor storage resources on the node side.
For more information about how to expand a disk when the disk size does not meet your business requirements or the disk is full, see Expand disk volumes.
For more information about disk mounting issues, see FAQ about disk volumes.