All Products
Search
Document Center

Container Service for Kubernetes:FAQ about cluster management

Last Updated:Aug 13, 2024

This topic provides answers to some frequently asked questions (FAQ) about creating, using, and managing clusters.

Are ACK clusters that run Alibaba Cloud Linux compatible with CentOS-based container images?

Yes, Container Service for Kubernetes (ACK) clusters that run Alibaba Cloud Linux are compatible with CentOS-based container images. For more information, see Use Alibaba Cloud Linux 3.

Can I change the container runtime of a cluster from containerd to Docker?

After a cluster is created, you cannot change the container runtime used by the cluster. However, you can create node pools that use different container runtimes in the cluster. The container runtimes used by node pools in the cluster can be different. For more information, see Create a node pool.

You can change the container runtime of a node from Docker to containerd. For more information, see Change the container runtime from Docker to containerd.

Note

Clusters that run Kubernetes 1.24 or later no longer use Docker as the built-in container runtime. You can use containerd as the container runtime for clusters that run Kubernetes 1.24 or later.

What are the differences between containerd, Docker, and Sandboxed-Container?

Container Service for Kubernetes supports the following container runtimes: containerd, Docker, and Sandboxed-Container. We recommend that you use containerd as the container runtime. You can use Docker as the container runtime in clusters that run Kubernetes V1.22 and earlier. You can use Sandboxed-Container as the container runtime in clusters that run Kubernetes V1.24 and earlier. For more information the comparison of different container runtimes, see Comparison among Docker, containerd, and Sandboxed-Container. If your cluster uses Docker as the container runtime, you must change the container runtime to containerd before you can update the Kubernetes version of your cluster to 1.24 or later. For more information, see Change the container runtime from Docker to containerd.

Is ACK certified for Level 3 Cybersecurity?

You can enable security hardening and configure baseline check policies for your clusters, based on Alibaba Cloud Linux, to achieve Multi-Level Protection Scheme (MLPS) 2.0 Level-3 compliance. This includes configuring compliance baseline checks to ensure that your clusters meet the following compliance requirements:

  • Identity verification

  • Access control

  • Security auditing

  • Intrusion prevention

  • Malicious code protection

For more information, see ACK security hardening based on MLPS.

Can I update an ACK dedicated cluster after I accidentally delete a master node of the cluster?

No, you cannot update an ACK dedicated cluster after you accidentally delete a master node of the cluster. After a master node of an ACK dedicated cluster is deleted, you cannot add another master node or update the Kubernetes version of the cluster. In this case, you can create another ACK dedicate cluster.

How do I connect to master nodes?

How do I collect the diagnostic data of an ACK cluster?

ACK provides the cluster diagnostics feature that allows you to diagnose clusters with a few clicks. This feature helps you troubleshoot cluster issues and node anomalies. For more information, see Work with cluster diagnostics.

You can also collect diagnostic data from master nodes and worker nodes for further analysis. The following section describes how to collect diagnostic data from Linux nodes and Windows nodes.

Collect diagnostic data from Linux nodes

Worker nodes support Linux and Windows, whereas master nodes support only Linux. The following steps apply to master nodes and worker nodes that run Linux. In this example, the diagnostic data is collected from a master node:

  1. Log on to the master node and run the following command to download a diagnostic script:

    curl -o /usr/local/bin/diagnose_k8s.sh http://aliacs-k8s-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/public/diagnose/diagnose_k8s.sh
    Note

    You can download the diagnostic script for Linux nodes only from the China (Hangzhou) region.

  2. Run the following command to grant execution permissions to the diagnostic script:

    chmod u+x /usr/local/bin/diagnose_k8s.sh
  3. Run the following command to go to the specified directory:

    cd /usr/local/bin
  4. Run the following command to run the diagnostic script:

    diagnose_k8s.sh

    The following output is returned. Each time you run the diagnostic script, a log file with a different name is generated. In this example, the log file is named diagnose_1514939155.tar.gz. The name is subject to the actual conditions.

    ......
    + echo 'please get diagnose_1514939155.tar.gz for diagnostics'
    please get diagnose_1514939155.tar.gz for diagnostics
    + echo 'Upload diagnose_1514939155.tar.gz'
    Upload diagnose_1514939155.tar.gz
  5. Run the following command to query the log file that stores the diagnostic data:

    ls -ltr | grep diagnose_1514939155.tar.gz
    Note

    Replace diagnose_1514939155.tar.gz with the actual name of the generated log file.

Collect diagnostic data from Windows nodes

To collect diagnostic data from a Windows worker node, perform the following steps to download and run a diagnostic script:

Note

Windows can run only on worker nodes.

  1. Log on to an abnormal node. Open the Run dialog box, enter cmd, and then click OK to open Command Prompt.

  2. Run the following command to switch to PowerShell:

    powershell
  3. Run the following command to download and run a diagnostic script:

    The diagnostic script for a Windows node can be downloaded only from the region in which the node resides. Replace [$Region_ID] in the command with the actual region ID of the node.

    Invoke-WebRequest -UseBasicParsing -Uri http://aliacs-k8s-[$Region_ID].oss-[$Region_ID].aliyuncs.com/public/pkg/windows/diagnose/diagnose.ps1 | Invoke-Expression

    If the following output is returned, the diagnostic data of the node is collected.

    INFO: Compressing diagnosis clues ...
    INFO: ...done
    INFO: Please get diagnoses_1514939155.zip for diagnostics
    Note

    The diagnoses_1514939155.zip file is stored in the directory in which the diagnostic script is run.

How do I troubleshoot ACK cluster issues?

Step 1: Check cluster nodes

  1. Run the following command to check whether all cluster nodes are in the Ready state:

    kubectl get nodes

    The following figure shows the expected output.p

    • If all cluster nodes exist and are in the Ready state, the nodes run as expected.

    • If a node is not in the Ready state, perform Step 2.

  2. Run the following command to query the details and events of a node:

    Replace [$NODE_NAME] with the actual node name.

    kubectl describe node [$NODE_NAME]
    Note

    For more information about the kubectl output, see Node status.

Step 2: Check cluster components

If all cluster nodes run as expected, check the logs of cluster components.

  1. Run the following command to view all components in the kube-system namespace:

    kubectl get pods -n kube-system

    The following figure shows the expected output. 1Components whose names start with kube- are system components. Components whose names start with coredns- are Alibaba Cloud DNS (DNS) components. The output shows that all cluster components run as expected. If a component does not run as expected, perform the following step.

  2. Run the following command to query the log of a component:

    Replace [$Component_Name] with the actual component name.

    kubectl logs -f [$Component_Name] -n kube-system

Step 3: Check the kubelet

  1. Run the following command to view the status of the kubelet:

    systemctl status kubelet
  2. If the kubelet is not in the Active state, run the following command to view the kubelet log. Identify and troubleshoot issues based on the log.

    journalctl -u kubelet

Common cluster issues

The following table describes common issues of ACK clusters and corresponding solutions.

Issue

Solution

The API server or a component on the master node stops. As a result, the following issues may occur:

  • You cannot create, stop, or update pods, Services, or Deployments.

  • All existing pods and Services run as expected unless the pods and Services need to call the ACK API to perform operations such as managing Kubernetes dashboards.

The components of ACK support high availability. We recommend that you check whether the components are abnormal. For example, the API server of an ACK cluster uses a Classic Load Balancer (CLB) instance. You can check why your CLB instance is abnormal.

The backend data of the API server is lost. As a result, the following issues may occur:

  • The API server cannot be started.

  • All existing pods and Services run as expected unless the pods and Services need to call the ACK API to perform operations such as managing Kubernetes dashboards.

  • The API server can be started only after the backend data of the API server is restored or recreated.

If you have created a snapshot before the issue occurs, you can restore data from the snapshot to resolve the issue. If no snapshot is created in advance, contact us for technical support. You can use the following methods to prevent this issue:

A node fails and all pods on the node stop running.

Create pods by using workloads such as Deployments, StatefulSets, and DaemonSets. Do not directly create pods. Otherwise, the system may not be able to schedule the pods to healthy nodes.

The kubelet fails. As a result, the following issues may occur:

  • You cannot create pods on a node on which the kubelet fails.

  • The kubelet may accidentally delete specific pods.

  • Specific nodes are marked as unhealthy.

  • Deployments or ReplicationControllers create pods on other nodes.

  • If you have created a snapshot before the issue occurs, you can restore data from the snapshot to resolve the issue. If no snapshot is created, contact us for technical support. Create snapshots for the volumes managed by the kubelet on a regular basis. For more information, see Use volume snapshots created from disks.

  • Create pods by using workloads such as Deployments, StatefulSets, and DaemonSets. Do not directly create pods. Otherwise, the system may not be able to schedule the pods to healthy nodes.

Other issues such as invalid configurations.

If you have created a snapshot before the issue occurs, you can restore data from the snapshot to resolve the issue. If no snapshot is created, contact us for technical support. Create snapshots for the volumes managed by the kubelet on a regular basis. For more information, see Use volume snapshots created from disks.

How do I use a pay-as-you-go CLB instance for the API server after I create subscription nodes for an ACK cluster?

If you select Subscription for the Billing Method parameter when you create an ACK cluster, subscription ECS instances are created in the node pool and a subscription CLB instance is used for the API server. If you want to use subscription nodes but a pay-as-you-go CLB instance for the API server, perform the following steps:

  1. Create an ACK cluster and configure parameters based on the following requirements. For more information about the configuration items required to create an ACK cluster, see Create an ACK managed cluster.

    • Billing Method: Select Pay-As-You-Go. In this case, an internal-facing CLB instance is created for the API server. By default, you are charged for the CLB instance by using the pay-as-you-go billing method.

    • Expected Nodes: Set this parameter to 0, which indicates that no node is created for the cluster.

  2. Create subscription ECS instances. For more information, see Create a subscription ECS instance on the Quick Launch tab.

  3. Add the ECS instances to a node pool. For more information about the limits on, usage notes of, procedure for, and FAQ about adding ECS instances to a node pool, see Add existing ECS instances to an ACK cluster

What CIDR blocks do I need to configure in the SLB ACLs to allow access to the API server of an ACK cluster?

You need to configure the access control lists (ACLs) for the Server Load Balancer (SLB) of the API server to accept access from the following CIDR blocks:

  • The control plane CIDR block of Container Service for Kubernetes: 100.104.0.0/16.

  • The primary CIDR block and the secondary CIDR blocks (if any) of the virtual private cloud (VPC) where the cluster resides, or the vSwitch CIDR block of the nodes in the cluster.

  • The public CIDR blocks used by clients that need to access the CLB instance of the API server.

  • The public CIDR blocks used by edge nodes if your cluster is an ACK Edge cluster.

  • The Vital Product Data (VPD) CIDR blocks if your cluster is an ACK Lingjun cluster.

You must configure the network ACL to accept access from the preceding CIDR blocks. Do not block access from the preceding CIDR blocks.

For more information, see Configure network ACLs for the API server of an ACK cluster.