All Products
Search
Document Center

Container Service for Kubernetes:Introduction to the cluster lifecycle and abnormal cluster states

Last Updated:Dec 13, 2024

A Container Service for Kubernetes (ACK) cluster may enter different phases and states within its lifecycle. The lifecycle of an ACK cluster includes the following phases: creation, O&M, and deletion. The O&M phase includes the following states: Scaling, Updating, Upgrading, Draining, and Removing. This topic describes the lifecycle of ACK clusters to help you better understand your cluster status and manage your clusters.

Cluster lifecycle

The following table describes different cluster states and the following figure shows the transitions between the states.

image
Note
  • ACK periodically checks the status of an ACK cluster. If the cluster meets specific anomaly conditions, the status of the cluster changes to Inactive or Unavailable. ACK notifies you of the status changes by sending emails or internal messages.

  • For ACK Pro clusters, cluster management fees incur when the status is Running, Upgrading, Draining, Removing, or Updating. For more information, see Billing rules.

Stage

State

Description

Creation and deployment

Initializing

The cluster is being created.

Failed

The cluster failed to be created.

Operation and maintenance

Running

The cluster is running.

Upgrading

The cluster is being upgraded.

Draining

Pods are being evicted from a node to other nodes in the cluster. After all pods are evicted from the node, the node becomes unschedulable.

Removing

Nodes are being removed from the cluster.

Updating

The metadata of the cluster is being updated.

Inactive

The cluster is temporarily unavailable in specific cases. For more information, see Inactive.

Unavailable

The cluster is unavailable because the cloud resources used by the cluster encounter errors. For more information, see Unavailable.

Deletion and release

Deleting

The cluster is being deleted.

Deletion Failed

The cluster failed to be deleted.

Deleted

The cluster is deleted. This state is invisible to users.

Description of cluster abnormal states

Inactive

If your cluster is in the Inactive state, you can identify the cause based on the cluster status code.

Status code

Description

Solution

KMSUnhealthy

Key Management Service (KMS)-based Secret encryption is enabled for the cluster. However, KMS is suspended for the Alibaba Cloud account due to reasons such as overdue payments. As a result, the control planes of the cluster cannot run as normal.

  1. Log on to the KMS console.

  2. Identify the cause of KMS suspension and resolve the issue. Then, resume KMS for the Alibaba Cloud account.

  3. Submit a ticket to restore the cluster to the Running state.

NoNodeForLongTime

No pod exists in the ACK Basic cluster and no pod is created within the previous 14 days.

Submit a ticket to restore the cluster to the Running state and then upgrade the cluster to an ACK Pro cluster.

AssumeRoleNotFound

The service roles required by ACK do not exist. As a result, the control planes of the cluster become abnormal.

  1. Refer to ACK roles to identify the roles required by ACK.

  2. Submit a ticket to restore the cluster to the Running state.

AssumeUserNotFound

The Resource Access Management (RAM) user required by ACK does not exist. As a result, the control planes of the cluster become abnormal.

Submit a ticket to request technical support.

SecurityGroupNotFound

The security groups required by ACK do not exist. As a result, the control planes of the cluster become abnormal.

Submit a ticket to request technical support.

UnderMaintenance

The control planes of the cluster are being maintained in the background.

Submit a ticket to request technical support.

ServiceInDebt

If the available balance in your account, including the account balance and vouchers, is less than the outstanding bill, your account will be considered overdue. As a result, your ACK Pro clusters will enter the Inactive state. Access to the API servers of the clusters will be restricted, and any operations involving the API servers will be suspended.

If your account remains overdue for more than 15 days, ACK stops providing services to you and the control plane resources of the clusters are deleted. However, other Alibaba Cloud services related to ACK are not released, including but not limited to NAT gateways, Server Load Balancer (SLB) instances, Elastic Compute Service (ECS) instances, and Auto Scaling groups. In this case, you must address any unexpected behavior that may arise from the Alibaba Cloud services related to ACK at the earliest opportunity.

Top up your account to pay the bill before your service is suspended. After you settle the overdue payment, your cluster is restored.

Unavailable

Cause

Solution

The Classic Load Balancer (CLB) instance that is used to expose the API server of the cluster is released due to the following reasons:

  • The CLB instance is manually released.

  • The subscription of the CLB instance expires.

    Important

    Starting from December 1, 2024, an instance fee is charged for newly created CLB instances. For more information, see CLB billing adjustments.

  • The CLB instance is released because the Alibaba Cloud account has overdue payments.

Clusters in the Unavailable state cannot be restored. You must recreate the cluster. For more information, see Delete ACK clusters and Create an ACK managed cluster.

Impacts

Impacts on billing

When your cluster is in the Inactive or Unavailable state, the control plane will scale in. After scaling in, cluster management fees will no longer be charged, but other associated cloud service fees will still apply.

Impacts on cluster operations

When your cluster is in the Inactive or Unavailable state, you can perform only the following operations on the cluster:

  • Enable or disable deletion protection for the cluster.

  • Delete the cluster.

Other impacts

When your cluster is in the inactive or unavailable state, ACK disables scaling groups associated with the cluster to avoid unexpected costs for creating new ECS instances. If the associated scaling groups are still disabled after the cluster is restored, you can manually enable it in the Auto Scaling console.