Usage notes and instructions on high-risk operations - Container Compute Service

Container Compute Service (ACS) provides managed architectures and key components for containerized computing. Improper operations on unmanaged components or applications in ACS clusters may result in service interruptions. To better estimate and avoid the risks that may arise, make sure that you read and understand the recommendations and usage notes in this topic before you get started with ACS.

Usage notes

Cluster updates

Use the cluster update feature of ACS to update the Kubernetes versions of your ACS clusters. Other methods may cause stability or compatibility issues.

ACS provides the following features to support cluster updates:

Version updates for ACS clusters.
Prechecks for version updates. The prechecks help ensure that an ACS cluster meets the conditions for version updates.
Release notes for new Kubernetes versions. The release notes describe new Kubernetes versions and compare new versions with earlier versions.
Notifications for potential risks due to resource changes. This feature can inform you of the risks that may arise due to resource changes caused by version updates.

We recommend that you follow these suggestions when you use the cluster update feature:

Perform a precheck before you update the cluster and fix the issues that are reported in the precheck result.
Read and understand the release notes of new Kubernetes versions. Check the status of your cluster and workloads based on the update risks that are reported by ACS. Then, evaluate the impacts of updating the cluster.
You cannot roll back cluster updates. Before you update a cluster, prepare for the update and make a backup plan.
Update your cluster to the latest Kubernetes version before this version is deprecated by ACS. For more information, see Support for Kubernetes versions.

Kubernetes configurations

Do not use annotations that are reserved by Kubernetes in YAML templates. Otherwise, resource unavailability, application failures, and exceptions may occur. Labels prefixed with kubernetes.io/ or k8s.io/ are reserved for key components. Example: pv.kubernetes.io/bind-completed: "yes".

ACS clusters

In the following scenario, ACS clusters are not eligible for compensation:

To simplify cluster O&M, ACS can manage specific system components for your cluster. After you enable managed system components for your cluster, the components are deployed and maintained by ACS. ACS does not provide compensation for business loss caused by user errors such as accidental deletion of Kubernetes resources used by managed system components.

High-risk operations

The following operations are considered high-risk operations in ACS and may greatly decrease the stability of your business. Read and understand the impacts of the following high-risk operations.

High-risk operations on clusters

Category	High-risk operation	Impact	How to recover
API Server	Delete the Server Load Balancer (SLB) instance that is used to expose the API server.	You fail to manage the cluster.	Unrecoverable. You must create a new cluster.
Others	Use Resource Access Management (RAM) to modify permissions.	Resources such as SLB instances may fail to be created.	Restore the permissions.

High-risk operations on networks and load balancing

High-risk operation	Impact	How to recover
Modify or delete the tags that ACS adds to SLB instances.	The SLB instances do not work as normal.	Restore the tags.
Modify the configurations of the SLB instances that are managed by ACS, including the configurations of the instances, listeners, and vServer groups.	The SLB instances do not work as normal.	Restore the SLB configurations.
Remove the `service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id: ${YOUR_LB_ID}` annotation that is used to specify an existing SLB instance from the Service configuration.	The existing SLB instance does not work as normal.	Add the annotation to the Service configuration. Note If a Service is configured to use an existing SLB instance, you cannot modify the configuration to create a new SLB instance for the Service. To use a new SLB instance, you must create a new Service.
Delete the SLB instances that are created by ACS in the SLB console.	Errors may occur in the cluster network.	Delete SLB instances by deleting the Services that are associated with the SLB instances. For more information about how to delete a Service, see Delete a Service.
Manually delete the `nginx-ingress-lb` Service in the kube-system namespace of a cluster that has the NGINX Ingress controller installed.	The NGINX Ingress controller does not run as normal and may stop running.	Use the following YAML template to create a Service that has the same name: `apiVersion: v1 kind: Service metadata: annotations: labels: app: nginx-ingress-lb name: nginx-ingress-lb namespace: kube-system spec: externalTrafficPolicy: Local ports: - name: http port: 80 protocol: TCP targetPort: 80 - name: https port: 443 protocol: TCP targetPort: 443 selector: app: ingress-nginx type: LoadBalancer`

High-risk operations on storage

High-risk operation	Impact	How to recover
Unmount disks from pods in the Elastic Compute Service (ECS) console.	I/O errors occur when you write data to the pods.	Restart the pods.
Mount a disk to multiple pods.	Pod data is written to local disks or I/O errors occur when you write data to the pods.	Mount the disk only to one pod. Important Alibaba Cloud disks cannot be shared. Each disk can be mounted only to one pod.
Manually delete the File Storage NAS (NAS) directories that are mounted to pods.	I/O errors occur when you write data to the pods.	Restart the pods.

High-risk operations on logs

High-risk operation

Impact

How to recover

Delete the aliyunlogconfig CustomResourceDefinitions (CRDs).

Logs fail to be collected.

Recreate the aliyunlogconfig CRDs that are deleted and the related resources. Logs that are generated within the time period during which the aliyunlogconfig CRDs do not exist cannot be collected.

If you delete the aliyunlogconfig CRDs, the related log collection tasks are also deleted. After you recreate the aliyunlogconfig CRDs, you must also relaunch the log collection tasks.

Uninstall logging components.

Logs fail to be collected.

Reinstall the logging components and manually create the aliyunlogconfig CRDs. Logs that are generated within the time period during which the logging components do not exist cannot be collected.

If you delete the logging components, the aliyunlogconfig CRDs and Logtail are also deleted. Logs that are generated within the time period during which the logging components do not exist cannot be collected.