All Products
Search
Document Center

Container Service for Kubernetes:Use Terraform to create a node pool that has auto scaling enabled

Last Updated:Dec 20, 2024

By default, nodes in node pools and managed node pools of Container Service for Kubernetes (ACK) cannot automatically scale in or out. You can use Terraform to create a node pool that has auto scaling enabled. This topic describes how to use Terraform to create a node pool that has auto scaling enabled.

Note

You can run the sample code in this topic with a few clicks. For more information, visit Terraform Explorer.

Prerequisites

  • The auto scaling feature is reliant on the Alibaba Cloud service Auto Scaling. Therefore, you must activate Auto Scaling and assign the default role for Auto Scaling to your account before you enable auto scaling for nodes. For more information, see Activate Auto Scaling.

    Note

    If you previously used the alicloud_cs_kubernetes_autoscaler component, Auto Scaling is activated.

  • Permissions are granted to access CloudOps Orchestration Service (OOS). You can perform the following steps to create the AliyunOOSLifecycleHook4CSRole role that provides OOS access permissions.

    1. Click AliyunOOSLifecycleHook4CSRole.

      Note
      • If the current account is an Alibaba Cloud account, click AliyunOOSLifecycleHook4CSRole.

      • If the current account is a RAM user, make sure that your Alibaba Cloud account is assigned the AliyunOOSLifecycleHook4CSRole role. Then, attach the AliyunRAMReadOnlyAccess policy to the RAM user. For more information, see Grant permissions to a RAM user.

    2. On the Cloud Resource Access Authorization page, click Agree to Authorization.

  • The runtime environment for Terraform is prepared by using one of the following methods:

    • Use Terraform in Terraform Explorer: Alibaba Cloud provides an online runtime environment for Terraform. You can log on to the environment to use Terraform without the need to install Terraform. This method is suitable for scenarios where you need to use and debug Terraform in a low-cost, efficient, and convenient manner.

    • Use Terraform in Cloud Shell: Cloud Shell is preinstalled with Terraform and configured with your identity credentials. You can run Terraform commands in Cloud Shell. This method is suitable for scenarios where you need to use and access Terraform in a low-cost, efficient, and convenient manner.

    • Install and configure Terraform on your on-premises machine: This method is suitable for scenarios where network connections are unstable or a custom development environment is needed.

Background information

Terraform is an open source tool that supports new infrastructures through Terraform providers. You can use Terraform to preview, configure, and manage cloud infrastructures and resources. For more information, see What is Terraform?

In earlier versions of Alibaba Cloud Provider, ACK provides a component named alicloud_cs_kubernetes_autoscaler. The alicloud_cs_kubernetes_autoscaler component can be used to enable auto scaling for nodes. However, the following limits apply:

  • The configuration is complex and the cost is high.

  • Each node to be scaled is added to the default node pool and cannot be separately maintained.

  • Some parameters cannot be modified.

Alibaba Cloud Provider 1.111.0 and later allow you to create node pools that have auto scaling enabled by using the alicloud_cs_kubernetes_node_pool component. This component has the following benefits:

  • Provides simple scaling configurations. You only need to set the lower and upper limits of the node quantity in the scaling group.

  • Uses default settings for optional parameters to prevent inconsistent environments among nodes. This prevents user errors. For example, you may configure different OS images for different nodes.

  • Allows you to explicitly view the changes of nodes in a node pool in the ACK console.

Resources

Note

You are charged for specific resources. If you no longer require the resources, you must release or unsubscribe from the resources at the earliest opportunity.

Use Terraform to create a node pool that has auto scaling enabled

alicloud_cs_kubernetes_autoscaler is previously used

If you previously used the alicloud_cs_kubernetes_autoscaler component, authorize your cluster to access Auto Scaling and perform the following steps to switch to the alicloud_cs_kubernetes_node_pool component. Then, you can create node pools that have auto scaling enabled in your cluster.

  1. Modify the autoscaler-meta ConfigMap.

    1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

    2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Configurations > ConfigMaps.

    3. In the upper-left corner of the ConfigMap page, select kube-system from the Namespace drop-down list. Find the autoscaler-meta ConfigMap and click Edit in the Actions column.

    4. In the Edit panel, modify the value of the autoscaler-meta ConfigMap.

      You need to change the value of taints from the string type to the array type. In this case, change "taints":"" to "taints":[] in the Value text box.

    5. Click OK.

  2. Synchronize the node pool.

    1. In the left-side navigation pane of the details page, choose Nodes > Node Pools.

    2. In the upper-right corner of the Node Pools page, click Sync Node Pool.

alicloud_cs_kubernetes_autoscaler is not previously used

You can use Terraform to create a node pool that has auto scaling enabled

  1. Create a node pool configuration file.

    Create a node pool that has auto scaling enabled in an existing ACK cluster.

    The following code provides an example on how to create a node pool that has auto scaling enabled in an existing ACK cluster:

    provider "alicloud" {
    }
    # Create a node pool that has auto scaling enabled in an existing ACK cluster. 
    resource "alicloud_cs_kubernetes_node_pool" "at1" {
      # The ID of the ACK cluster where you want to create the node pool. 
      cluster_id           = ""
      name                 = "np-test"
      # The vSwitches that are used by nodes in the node pool. You must specify at least one vSwitch. 
      vswitch_ids          = ["vsw-bp1mdigyhmilu2h4v****"]
      instance_types       = ["ecs.e3.medium"]
      password             = "Hello1234"
     
      scaling_config {
        # The minimum number of nodes in the node pool. 
        min_size     = 1
        # The maximum number of nodes in the node pool. 
        max_size     = 5
      }
    
    }

    Create a node pool that has auto scaling enabled

    The following code provides an example on how to create a cluster that contains a node pool with auto scaling enabled:

    provider "alicloud" {
      region = var.region_id
    }
    
    variable "region_id" {
      type    = string
      default = "cn-shenzhen"
    }
    
    variable "cluster_spec" {
      type        = string
      description = "The cluster specifications of kubernetes cluster,which can be empty. Valid values:ack.standard : Standard managed clusters; ack.pro.small : Professional managed clusters."
      default     = "ack.pro.small"
    }
    
    # Specify the zones of vSwitches. 
    variable "availability_zone" {
      description = "The availability zones of vswitches."
      default     = ["cn-shenzhen-c", "cn-shenzhen-e", "cn-shenzhen-f"]
    }
    
    # The CIDR blocks used to create vSwitches. 
    variable "node_vswitch_cidrs" {
      type        = list(string)
      default     = ["172.16.0.0/23", "172.16.2.0/23", "172.16.4.0/23"]
    }
    
    # This variable specifies the CIDR blocks in which Terway vSwitches are created. 
    variable "terway_vswitch_cidrs" {
      type        = list(string)
      default     = ["172.16.208.0/20", "172.16.224.0/20", "172.16.240.0/20"]
    }
    
    # Specify the ECS instance types of worker nodes. 
    variable "worker_instance_types" {
      description = "The ecs instance types used to launch worker nodes."
      default     = ["ecs.g6.2xlarge", "ecs.g6.xlarge"]
    }
    
    # Specify a password for the worker node.
    variable "password" {
      description = "The password of ECS instance."
      default     = "Test123456"
    }
    
    # Specify the prefix of the name of the ACK managed cluster. 
    variable "k8s_name_prefix" {
      description = "The name prefix used to create managed kubernetes cluster."
      default     = "tf-ack-shenzhen"
    }
    
    # Specify the components that you want to install in the ACK managed cluster. The components include Terway (network plug-in), csi-plugin (volume plug-in), csi-provisioner (volume plug-in), logtail-ds (logging plug-in), the NGINX Ingress controller, ack-arms-prometheus (monitoring plug-in), and ack-node-problem-detector (node diagnostics plug-in). 
    variable "cluster_addons" {
      type = list(object({
        name   = string
        config = string
      }))
    
      default = [
        {
          "name"   = "terway-eniip",
          "config" = "",
        },
        {
          "name"   = "logtail-ds",
          "config" = "{\"IngressDashboardEnabled\":\"true\"}",
        },
        {
          "name"   = "nginx-ingress-controller",
          "config" = "{\"IngressSlbNetworkType\":\"internet\"}",
        },
        {
          "name"   = "arms-prometheus",
          "config" = "",
        },
        {
          "name"   = "ack-node-problem-detector",
          "config" = "{\"sls_project_name\":\"\"}",
        },
        {
          "name"   = "csi-plugin",
          "config" = "",
        },
        {
          "name"   = "csi-provisioner",
          "config" = "",
        }
      ]
    }
    
    # The default resource names. 
    locals {
      k8s_name_terway = "k8s_name_terway_${random_integer.default.result}"
      vpc_name = "vpc_name_${random_integer.default.result}"
      autoscale_nodepool_name = "autoscale-node-pool-${random_integer.default.result}"
    }
    
    # The ECS instance specifications of the worker nodes. Terraform searches for ECS instance types that fulfill the CPU and memory requests. 
    data "alicloud_instance_types" "default" {
      cpu_core_count       = 8
      memory_size          = 32
      availability_zone    = var.availability_zone[0]
      kubernetes_node_role = "Worker"
    }
    
    resource "random_integer" "default" {
      min = 10000
      max = 99999
    }
    
    # The VPC. 
    resource "alicloud_vpc" "default" {
      vpc_name   = local.vpc_name
      cidr_block = "172.16.0.0/12"
    }
    
    # The node vSwitch. 
    resource "alicloud_vswitch" "vswitches" {
      count      = length(var.node_vswitch_cidrs)
      vpc_id     = alicloud_vpc.default.id
      cidr_block = element(var.node_vswitch_cidrs, count.index)
      zone_id    = element(var.availability_zone, count.index)
    }
    
    # The pod vSwitch. 
    resource "alicloud_vswitch" "terway_vswitches" {
      count      = length(var.terway_vswitch_cidrs)
      vpc_id     = alicloud_vpc.default.id
      cidr_block = element(var.terway_vswitch_cidrs, count.index)
      zone_id    = element(var.availability_zone, count.index)
    }
    
    # The ACK managed cluster. 
    resource "alicloud_cs_managed_kubernetes" "default" {
      name                         = local.k8s_name_terway # The ACK cluster name. 
      cluster_spec                 = var.cluster_spec      # Create an ACK Pro cluster. 
      worker_vswitch_ids           = split(",", join(",", alicloud_vswitch.vswitches.*.id))        # The vSwitch to which the node pool belongs. Specify one or more vSwitch IDs. The vSwitches must reside in the zone specified by availability_zone. 
      pod_vswitch_ids              = split(",", join(",", alicloud_vswitch.terway_vswitches.*.id)) # The vSwitch of the pod. 
      new_nat_gateway              = true                                                          # Specify whether to create a NAT gateway when the Kubernetes cluster is created. Default value: true. 
      service_cidr                 = "10.11.0.0/16"                                                # The pod CIDR block. If you set the cluster_network_type parameter to flannel, this parameter is required. The pod CIDR block cannot be the same as the VPC CIDR block or the CIDR blocks of other Kubernetes clusters in the VPC. You cannot change the pod CIDR block after the cluster is created. Maximum number of hosts in the cluster: 256. 
      slb_internet_enabled         = true                                                          # Specify whether to create an Internet-facing SLB instance for the API server of the cluster. Default value: false. 
      enable_rrsa                  = true
      control_plane_log_components = ["apiserver", "kcm", "scheduler", "ccm"] # The control plane logs. 
      dynamic "addons" {                                                      # Component management. 
        for_each = var.cluster_addons
        content {
          name   = lookup(addons.value, "name", var.cluster_addons)
          config = lookup(addons.value, "config", var.cluster_addons)
        }
      }
    }
    
    # Create a node pool for which auto scaling is enabled. The node pool can be scaled out to a maximum of 10 nodes and must contain at least 1 node. 
    resource "alicloud_cs_kubernetes_node_pool" "autoscale_node_pool" {
      cluster_id     = alicloud_cs_managed_kubernetes.default.id
      node_pool_name = local.autoscale_nodepool_name
      vswitch_ids    = split(",", join(",", alicloud_vswitch.vswitches.*.id))
    
      scaling_config {
        min_size = 1
        max_size = 10
      }
    
      instance_types        = var.worker_instance_types
      password              = var.password # The password that is used to log on to the cluster by using SSH. 
      install_cloud_monitor = true         # Specify whether to install the CloudMonitor agent on the nodes in the cluster. 
      system_disk_category  = "cloud_efficiency"
      system_disk_size      = 100
      image_type            = "AliyunLinux3"
    
      data_disks {              # The data disk configuration of the node. 
        category = "cloud_essd" # The disk category. 
        size     = 120          # The disk size. 
      }
    }
  2. Run the following command to initialize the Terraform runtime environment:

    terraform init

    If the following information is returned, Terraform is initialized:

    Terraform has been successfully initialized!
    
    You may now begin working with Terraform. Try running "terraform plan" to see
    any changes that are required for your infrastructure. All Terraform commands
    should now work.
    
    If you ever set or change modules or backend configuration for Terraform,
    rerun this command to reinitialize your working directory. If you forget, other
    commands will detect it and remind you to do so if necessary.
  3. Run the terraform apply command to create the node pool.

  4. Verify the result.

    After the node pool is created, you can find the node pool on the Node Pools page. Auto Scaling Enabled appears below the name of the node pool.

Clear resources

If you no longer require the preceding resources created or managed by Terraform, run the terraform destroy command to release the resources. For more information about the terraform destroy command, see Common commands.

terraform destroy

Sample code

Note

You can run the sample code in this topic with a few clicks. For more information, visit Terraform Explorer.

Full code

provider "alicloud" {
  region = var.region_id
}

variable "region_id" {
  type    = string
  default = "cn-shenzhen"
}

variable "cluster_spec" {
  type        = string
  description = "The cluster specifications of kubernetes cluster,which can be empty. Valid values:ack.standard : Standard managed clusters; ack.pro.small : Professional managed clusters."
  default     = "ack.pro.small"
}

# Specify the zones of vSwitches. 
variable "availability_zone" {
  description = "The availability zones of vswitches."
  default     = ["cn-shenzhen-c", "cn-shenzhen-e", "cn-shenzhen-f"]
}

# The CIDR blocks used to create vSwitches. 
variable "node_vswitch_cidrs" {
  type        = list(string)
  default     = ["172.16.0.0/23", "172.16.2.0/23", "172.16.4.0/23"]
}

# This variable specifies the CIDR blocks in which Terway vSwitches are created. 
variable "terway_vswitch_cidrs" {
  type        = list(string)
  default     = ["172.16.208.0/20", "172.16.224.0/20", "172.16.240.0/20"]
}

# Specify the ECS instance types of worker nodes. 
variable "worker_instance_types" {
  description = "The ecs instance types used to launch worker nodes."
  default     = ["ecs.g6.2xlarge", "ecs.g6.xlarge"]
}

# Specify a password for the worker node.
variable "password" {
  description = "The password of ECS instance."
  default     = "Test123456"
}

# Specify the prefix of the name of the ACK managed cluster. 
variable "k8s_name_prefix" {
  description = "The name prefix used to create managed kubernetes cluster."
  default     = "tf-ack-shenzhen"
}

# Specify the components that you want to install in the ACK managed cluster. The components include Terway (network plug-in), csi-plugin (volume plug-in), csi-provisioner (volume plug-in), logtail-ds (logging plug-in), the NGINX Ingress controller, ack-arms-prometheus (monitoring plug-in), and ack-node-problem-detector (node diagnostics plug-in). 
variable "cluster_addons" {
  type = list(object({
    name   = string
    config = string
  }))

  default = [
    {
      "name"   = "terway-eniip",
      "config" = "",
    },
    {
      "name"   = "logtail-ds",
      "config" = "{\"IngressDashboardEnabled\":\"true\"}",
    },
    {
      "name"   = "nginx-ingress-controller",
      "config" = "{\"IngressSlbNetworkType\":\"internet\"}",
    },
    {
      "name"   = "arms-prometheus",
      "config" = "",
    },
    {
      "name"   = "ack-node-problem-detector",
      "config" = "{\"sls_project_name\":\"\"}",
    },
    {
      "name"   = "csi-plugin",
      "config" = "",
    },
    {
      "name"   = "csi-provisioner",
      "config" = "",
    }
  ]
}

# The default resource names. 
locals {
  k8s_name_terway = "k8s_name_terway_${random_integer.default.result}"
  vpc_name = "vpc_name_${random_integer.default.result}"
  autoscale_nodepool_name = "autoscale-node-pool-${random_integer.default.result}"
}

# The ECS instance specifications of the worker nodes. Terraform searches for ECS instance types that fulfill the CPU and memory requests. 
data "alicloud_instance_types" "default" {
  cpu_core_count       = 8
  memory_size          = 32
  availability_zone    = var.availability_zone[0]
  kubernetes_node_role = "Worker"
}

resource "random_integer" "default" {
  min = 10000
  max = 99999
}

# The VPC. 
resource "alicloud_vpc" "default" {
  vpc_name   = local.vpc_name
  cidr_block = "172.16.0.0/12"
}

# The node vSwitch. 
resource "alicloud_vswitch" "vswitches" {
  count      = length(var.node_vswitch_cidrs)
  vpc_id     = alicloud_vpc.default.id
  cidr_block = element(var.node_vswitch_cidrs, count.index)
  zone_id    = element(var.availability_zone, count.index)
}

# The pod vSwitch. 
resource "alicloud_vswitch" "terway_vswitches" {
  count      = length(var.terway_vswitch_cidrs)
  vpc_id     = alicloud_vpc.default.id
  cidr_block = element(var.terway_vswitch_cidrs, count.index)
  zone_id    = element(var.availability_zone, count.index)
}

# The ACK managed cluster. 
resource "alicloud_cs_managed_kubernetes" "default" {
  name                         = local.k8s_name_terway # The ACK cluster name. 
  cluster_spec                 = var.cluster_spec      # Create an ACK Pro cluster. 
  worker_vswitch_ids           = split(",", join(",", alicloud_vswitch.vswitches.*.id))        # The vSwitch to which the node pool belongs. Specify one or more vSwitch IDs. The vSwitches must reside in the zone specified by availability_zone. 
  pod_vswitch_ids              = split(",", join(",", alicloud_vswitch.terway_vswitches.*.id)) # The vSwitch of the pod. 
  new_nat_gateway              = true                                                          # Specify whether to create a NAT gateway when the Kubernetes cluster is created. Default value: true. 
  service_cidr                 = "10.11.0.0/16"                                                # The pod CIDR block. If you set the cluster_network_type parameter to flannel, this parameter is required. The pod CIDR block cannot be the same as the VPC CIDR block or the CIDR blocks of other Kubernetes clusters in the VPC. You cannot change the pod CIDR block after the cluster is created. Maximum number of hosts in the cluster: 256. 
  slb_internet_enabled         = true                                                          # Specify whether to create an Internet-facing SLB instance for the API server of the cluster. Default value: false. 
  enable_rrsa                  = true
  control_plane_log_components = ["apiserver", "kcm", "scheduler", "ccm"] # The control plane logs. 
  dynamic "addons" {                                                      # Component management. 
    for_each = var.cluster_addons
    content {
      name   = lookup(addons.value, "name", var.cluster_addons)
      config = lookup(addons.value, "config", var.cluster_addons)
    }
  }
}

# Create a node pool for which auto scaling is enabled. The node pool can be scaled out to a maximum of 10 nodes and must contain at least 1 node. 
resource "alicloud_cs_kubernetes_node_pool" "autoscale_node_pool" {
  cluster_id     = alicloud_cs_managed_kubernetes.default.id
  node_pool_name = local.autoscale_nodepool_name
  vswitch_ids    = split(",", join(",", alicloud_vswitch.vswitches.*.id))

  scaling_config {
    min_size = 1
    max_size = 10
  }

  instance_types        = var.worker_instance_types
  password              = var.password # The password that is used to log on to the cluster by using SSH. 
  install_cloud_monitor = true         # Specify whether to install the CloudMonitor agent on the nodes in the cluster. 
  system_disk_category  = "cloud_efficiency"
  system_disk_size      = 100
  image_type            = "AliyunLinux3"

  data_disks {              # The data disk configuration of the node. 
    category = "cloud_essd" # The disk category. 
    size     = 120          # The disk size. 
  }
}

If you want to view more complete examples, visit the directory of the corresponding service on the Landing with Terraform page.