All Products
Search
Document Center

Container Service for Kubernetes:Use Terraform to associate a deployment set with a node pool

Last Updated:Feb 17, 2025

A deployment set contains Elastic Compute Service (ECS) instances that are distributed across different physical servers. You can use deployment sets to improve the availability of your applications and implement disaster recovery. A node pool that is associated with a deployment set contains ECS nodes that are distributed across multiple physical servers. You can configure pod affinity to deploy your application pods to different ECS nodes. This way, disaster recovery is implemented and the availability of your applications is improved. This topic describes how to use Terraform to associate a deployment set with a node pool.

Note

You can run the sample code in this topic with a few clicks. Click here to run the sample code.

Prerequisites

  • The runtime environment for Terraform is prepared by using one of the following methods:

    • Use Terraform in Terraform Explorer: Alibaba Cloud provides an online runtime environment for Terraform. You can log on to the environment to use Terraform without the need to install Terraform. This method is suitable for scenarios where you need to use and debug Terraform in a low-cost, efficient, and convenient manner.

    • Use Terraform in Cloud Shell: Cloud Shell is preinstalled with Terraform and configured with your identity credentials. You can run Terraform commands in Cloud Shell. This method is suitable for scenarios where you need to use and access Terraform in a low-cost, efficient, and convenient manner.

    • Use Terraform in ROS: Resource Orchestration Service (ROS) supports the integration of Terraform templates. By using Terraform with ROS, you can define and manage resources in Alibaba Cloud, Amazon Web Services (AWS), or Microsoft Azure, specify resource parameters, and configure dependency relationships for the resources.

    • Install and configure Terraform on your on-premises machine: This method is suitable for scenarios where network connections are unstable or a custom development environment is needed.

    Note

    Terraform 0.12.28 or later is installed. You can run the terraform --version command to query the Terraform version.

  • The ECS quota of the deployment set is sufficient and sufficient ECS instances of the specified instance types are available. By default, each deployment set can contain up to 20 ECS instances in each zone. For more information, see View and increase resource quotas.

  • By default, an Alibaba Cloud account has full permissions on all resources that belong to this account. Security risks may arise if the credentials of an Alibaba Cloud account are leaked. We recommend that you use Resource Access Management (RAM) users to manage resources. When you create a RAM user, you need to create an AccessKey pair for the RAM user. For more information, see Create a RAM user and Create an AccessKey pair.

  • The following policy is attached to the RAM user that you use to run commands in Terraform. The policy includes the minimum permissions required to run commands in Terraform. For more information, see Grant permissions to a RAM user.

    This policy allows Resource Access Management (RAM) users to create, view, and delete virtual private clouds (VPCs), vSwitches, deployment sets, and ACK clusters.

    {
        "Version": "1",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "vpc:CreateVpc",
                    "vpc:CreateVSwitch",
                    "cs:CreateCluster",
                    "vpc:DescribeVpcAttribute",
                    "vpc:DescribeVSwitchAttributes",
                    "vpc:DescribeRouteTableList",
                    "vpc:DescribeNatGateways",
                    "cs:DescribeTaskInfo",
                    "cs:DescribeClusterDetail",
                    "cs:GetClusterCerts",
                    "cs:CheckControlPlaneLogEnable",
                    "cs:CreateClusterNodePool",
                    "cs:DescribeClusterNodePoolDetail",
                    "cs:ModifyClusterNodePool",
                    "vpc:DeleteVpc",
                    "vpc:DeleteVSwitch",
                    "cs:DeleteCluster",
                    "cs:DeleteClusterNodepool",
                    "ecs:CreateDeploymentSet",
                    "ecs:DescribeDeploymentSets",
                    "ecs:ModifyDeploymentSetAttribute",
                    "ecs:DeleteDeploymentSet"
                ],
                "Resource": "*"
            }
        ]
    }

Background Information

To ensure the high availability of your application in a zone, you must deploy your application across multiple hosts. However, when a physical server is down, all application pods are affected. To resolve this issue, you can use deployment sets that are provided by ECS. The ECS instances that are contained in a deployment set are distributed across multiple physical servers and are isolated from each other. This helps prevent service disruptions that are caused by single points of failure. For more information about deployment sets, see Deployment set.

Limits

Cluster feature usage guidelines

  • Deployment sets are supported by ACK dedicated clusters and ACK managed clusters.

  • You can associate a deployment set with a node pool only when you create the node pool. Existing node pools cannot have deployment sets enabled. You can associate only one deployment set with each node pool and you cannot change the deployment sets that are associated with node pools.

  • You cannot manually add ECS instances to or remove ECS instances from deployment sets. If you want to change the number of ECS instances in a deployment set, you can scale the node pool with which the deployment set is associated. For more information, see Create and manage a node pool.

  • After you associate a deployment set with a node pool, the node pool does not support preemptible instances.

Deployment set quotas and specifications limits

  • By default, node pool deployment sets are implemented based on a high-availability strategy. In a deployment set that adopts the high availability strategy, you can create up to 20 ECS instances per zone. You can use the following formula to calculate the maximum number of ECS instances that you can create in a deployment set within an Alibaba Cloud region: 20 × Number of zones within the region. For more information, see Deployment set.

    You cannot increase the number of ECS instances in a deployment set. However, if you want to increase the maximum number of deployment sets that your Alibaba Cloud account can have, request a quota increase in the Quota Center console. For more information about the limits and quotas of deployment sets, see Deployment set limits.

  • Instance family limits

    Deployment strategies that can be used may vary based on the instance family. The following table describes the deployment strategies that are supported by different instance families.

    Note

    To query the instance families that support a specific deployment strategy, call the DescribeDeploymentSetSupportedInstanceTypeFamily operation.

    Deployment strategy

    Instance families that support the deployment strategy

    High availability strategy or high availability group strategy

    • g8a, g8i, g8y, g7se, g7a, g7, g7h, g7t, g7ne, g7nex, g6, g6e, g6a, g5, g5ne, sn2ne, sn2, and sn1

    • c8a, c8i, c8y, c7se, c7, c7t, c7nex, c7a, c6, c6a, c6e, c5, ic5, and sn1ne

    • r8a, r8i, r8y, r7, r7se, r7t, r7a, r6, r6e, r6a, re6, re6p, r5, re4, se1ne, and se1

    • hfc8i, hfg8i, hfr8i, hfc7, hfg7, hfr7, hfc6, hfg6, hfr6, hfc5, and hfg5

    • d3c, d2s, d2c, d1, d1ne, d1-c14d3, and d1-c8d3

    • i3g, i3, i2, i2g, i2ne, i2gne, and i1

    • ebmg5, ebmc7, ebmg7, ebmr7, sccgn6, scch5, scch5s, sccg5, and sccg5s

    • e, t6, xn4, mn4, n4, e4, n2, and n1

    • gn6i

    Low latency strategy

    • g8a, g8i, g8ae, and g8y

    • c8a, c8i, c8ae, and c8y

    • ebmc8i, ebmg8i, and ebmr8i

    • r8a, r8i, r8ae, and r8y

    • ebmc7, ebmg7, and ebmr7

  • Insufficient instance resources within the region may result in a failure to create ECS instances or start pay-as-you-go instances that were stopped in economical mode in a deployment set. Wait for a while and then try to create or start the instances again.

Required resources

Note

Fees are generated for specific resources used in this example. Release or unsubscribe from the resources when you no longer need them.

Use Terraform to create a node pool and associate a deployment set with the node pool

  1. Use the following template to create a node pool and associate a deployment set with the node pool:

    provider "alicloud" {
      region = var.region_id
    }
    
    variable "region_id" {
      type    = string
      default = "cn-shenzhen"
    }
    
    variable "name" {
      default = "tf-example"
    }
    
    variable "strategy" {
      default     = "Availability"
      description = "The deployment strategy. Valid values: Availability, AvailabilityGroup, LowLatency."
    }
    
    variable "cluster_spec" {
      type        = string
      description = "The cluster specifications of kubernetes cluster,which can be empty. Valid values:ack.standard : Standard managed clusters; ack.pro.small : Professional managed clusters."
      default     = "ack.pro.small"
    }
    
    # Specify the zones of vSwitches. 
    variable "availability_zone" {
      description = "The availability zones of vswitches."
      default     = ["cn-shenzhen-c", "cn-shenzhen-e", "cn-shenzhen-f"]
    }
    
    # The CIDR blocks used to create vSwitches. 
    variable "node_vswitch_cidrs" {
      type    = list(string)
      default = ["172.16.0.0/23", "172.16.2.0/23", "172.16.4.0/23"]
    }
    
    # The CIDR blocks used to create Terway vSwitches. 
    variable "terway_vswitch_cidrs" {
      type    = list(string)
      default = ["172.16.208.0/20", "172.16.224.0/20", "172.16.240.0/20"]
    }
    
    # Specify the ECS instance types of worker nodes. 
    variable "worker_instance_types" {
      description = "The ecs instance types used to launch worker nodes."
      default     = ["ecs.g6.2xlarge", "ecs.g6.xlarge"]
    }
    
    # Specify a password for the worker node.
    variable "password" {
      description = "The password of ECS instance."
      default     = "Test123456"
    }
    
    # Specify the components that you want to install in the ACK managed cluster. The components include Terway (network plug-in), csi-plugin (volume plug-in), csi-provisioner (volume plug-in), logtail-ds (logging plug-in), the NGINX Ingress controller, ack-arms-prometheus (monitoring plug-in), and ack-node-problem-detector (node diagnostics plug-in). 
    variable "cluster_addons" {
      type = list(object({
        name   = string
        config = string
      }))
    
      default = [
        {
          "name"   = "terway-eniip",
          "config" = "",
        },
        {
          "name"   = "logtail-ds",
          "config" = "{\"IngressDashboardEnabled\":\"true\"}",
        },
        {
          "name"   = "nginx-ingress-controller",
          "config" = "{\"IngressSlbNetworkType\":\"internet\"}",
        },
        {
          "name"   = "arms-prometheus",
          "config" = "",
        },
        {
          "name"   = "ack-node-problem-detector",
          "config" = "{\"sls_project_name\":\"\"}",
        },
        {
          "name"   = "csi-plugin",
          "config" = "",
        },
        {
          "name"   = "csi-provisioner",
          "config" = "",
        }
      ]
    }
    
    # Specify the prefix of the name of the ACK managed cluster. 
    variable "k8s_name_prefix" {
      description = "The name prefix used to create managed kubernetes cluster."
      default     = "tf-ack"
    }
    
    variable "vpc_name" {
      default = "tf-vpc"
    }
    variable "nodepool_name" {
      default = "default-nodepool"
    }
    
    # The default resource names. 
    locals {
      k8s_name_terway = substr(join("-", [var.k8s_name_prefix, "terway"]), 0, 63)
    }
    
    # The VPC. 
    resource "alicloud_vpc" "default" {
      vpc_name   = var.vpc_name
      cidr_block = "172.16.0.0/12"
    }
    
    # The node vSwitches. 
    resource "alicloud_vswitch" "vswitches" {
      count      = length(var.node_vswitch_cidrs)
      vpc_id     = alicloud_vpc.default.id
      cidr_block = element(var.node_vswitch_cidrs, count.index)
      zone_id    = element(var.availability_zone, count.index)
    }
    
    # The pod vSwitches. 
    resource "alicloud_vswitch" "terway_vswitches" {
      count      = length(var.terway_vswitch_cidrs)
      vpc_id     = alicloud_vpc.default.id
      cidr_block = element(var.terway_vswitch_cidrs, count.index)
      zone_id    = element(var.availability_zone, count.index)
    }
    
    # Create a deployment set.
    resource "alicloud_ecs_deployment_set" "default" {
      strategy            = var.strategy
      domain              = "Default"
      granularity         = "Host"
      deployment_set_name = var.name
      description         = "example_value"
    }
    
    # The ACK managed cluster. 
    resource "alicloud_cs_managed_kubernetes" "default" {
      name         = local.k8s_name_terway # The ACK cluster name. 
      cluster_spec = var.cluster_spec      # Create an ACK Pro cluster. 
      worker_vswitch_ids           = split(",", join(",", alicloud_vswitch.vswitches.*.id))        # The vSwitches used by the node pool. Specify one or more vSwitch IDs. The vSwitches must reside in the zone specified by availability_zone. 
      pod_vswitch_ids              = split(",", join(",", alicloud_vswitch.terway_vswitches.*.id)) # The vSwitches used by pods. 
      new_nat_gateway              = true                                                          # Specify whether to create a NAT gateway when the ACK cluster is created. Default value: true. 
      service_cidr                 = "10.11.0.0/16"                                                # The pod CIDR block. If you set the cluster_network_type parameter to flannel, this parameter is required. The pod CIDR block cannot be the same as the VPC CIDR block or the CIDR blocks of other ACK clusters in the VPC. You cannot change the pod CIDR block after the cluster is created. Maximum number of hosts in the cluster: 256. 
      slb_internet_enabled         = true                                                          # Specify whether to create an Internet-facing SLB instance for the API server of the cluster. Default value: false. 
      enable_rrsa                  = true
      control_plane_log_components = ["apiserver", "kcm", "scheduler", "ccm"] # The control plane logs. 
    
      dynamic "addons" { # Component management. 
        for_each = var.cluster_addons
        content {
          name   = lookup(addons.value, "name", var.cluster_addons)
          config = lookup(addons.value, "config", var.cluster_addons)
        }
      }
    }
    
    # The regular node pool. 
    resource "alicloud_cs_kubernetes_node_pool" "default" {
      cluster_id           = alicloud_cs_managed_kubernetes.default.id              # The ACK cluster name. 
      node_pool_name       = var.nodepool_name                                      # The node pool name. 
      vswitch_ids          = split(",", join(",", alicloud_vswitch.vswitches.*.id)) # The vSwitches used by the node pool. Specify one or more vSwitch IDs. The vSwitches must reside in the zone specified by availability_zone. 
      instance_types       = var.worker_instance_types
      instance_charge_type = "PostPaid"
      runtime_name         = "containerd"
      desired_size          = 2            # The expected number of nodes in the node pool. 
      password              = var.password # The password that is used to log on to the cluster by using SSH. 
      install_cloud_monitor = true         # Specify whether to install the CloudMonitor agent on the nodes in the cluster. 
      system_disk_category  = "cloud_essd"
      system_disk_size      = 100
      image_type            = "AliyunLinux"
      deployment_set_id     = alicloud_ecs_deployment_set.default.id
    
      data_disks {              # The data disk configuration of the node. 
        category = "cloud_essd" # The disk category. 
        size     = 120          # The disk size. 
      }
    }
    
  2. Run the following command to initialize the Terraform runtime environment:

    terraform init

    If the following information is returned, Terraform is initialized:

    Terraform has been successfully initialized!
    
    You may now begin working with Terraform. Try running "terraform plan" to see
    any changes that are required for your infrastructure. All Terraform commands
    should now work.
    
    If you ever set or change modules or backend configuration for Terraform,
    rerun this command to reinitialize your working directory. If you forget, other
    commands will detect it and remind you to do so if necessary.
  3. Run the following command to create the node pool:

    terraform apply 

    If the following information is returned, the node pool is associated with a deployment set:

    Do you want to perform these actions?
      Terraform will perform the actions described above.
      Only 'yes' will be accepted to approve.
    
      Enter a value: yes
    
    ...
    
    Apply complete!  Resources: 10 added, 0 changed, 0 destroyed.
  4. Verify the result

    Run the terraform show command

    Run the following command to query the resources that are created by Terraform:

    terraform show

    image

    Log on to the ACK console

    You can find the node pool that you created on the Node Pools page in the ACK console. You can click Edit in the Actions column to view the associated deployment set.image

Clear resources

If you no longer require the preceding resources created or managed by Terraform, run the terraform destroy command to release the resources. For more information about the terraform destroy command, see Common commands.

terraform destroy

Example

Note

You can run the sample code in this topic with a few clicks. Click here to run the sample code.

Sample code

provider "alicloud" {
  region = var.region_id
}

variable "region_id" {
  type    = string
  default = "cn-shenzhen"
}

variable "name" {
  default = "tf-example"
}

variable "strategy" {
  default     = "Availability"
  description = "The deployment strategy. Valid values: Availability, AvailabilityGroup, LowLatency."
}

variable "cluster_spec" {
  type        = string
  description = "The cluster specifications of kubernetes cluster,which can be empty. Valid values:ack.standard : Standard managed clusters; ack.pro.small : Professional managed clusters."
  default     = "ack.pro.small"
}

# Specify the zones of vSwitches. 
variable "availability_zone" {
  description = "The availability zones of vswitches."
  default     = ["cn-shenzhen-c", "cn-shenzhen-e", "cn-shenzhen-f"]
}

# The CIDR blocks used to create vSwitches. 
variable "node_vswitch_cidrs" {
  type    = list(string)
  default = ["172.16.0.0/23", "172.16.2.0/23", "172.16.4.0/23"]
}

# The CIDR blocks used to create Terway vSwitches. 
variable "terway_vswitch_cidrs" {
  type    = list(string)
  default = ["172.16.208.0/20", "172.16.224.0/20", "172.16.240.0/20"]
}

# Specify the ECS instance types of worker nodes. 
variable "worker_instance_types" {
  description = "The ecs instance types used to launch worker nodes."
  default     = ["ecs.g6.2xlarge", "ecs.g6.xlarge"]
}

# Specify a password for the worker node.
variable "password" {
  description = "The password of ECS instance."
  default     = "Test123456"
}

# Specify the components that you want to install in the ACK managed cluster. The components include Terway (network plug-in), csi-plugin (volume plug-in), csi-provisioner (volume plug-in), logtail-ds (logging plug-in), the NGINX Ingress controller, ack-arms-prometheus (monitoring plug-in), and ack-node-problem-detector (node diagnostics plug-in). 
variable "cluster_addons" {
  type = list(object({
    name   = string
    config = string
  }))

  default = [
    {
      "name"   = "terway-eniip",
      "config" = "",
    },
    {
      "name"   = "logtail-ds",
      "config" = "{\"IngressDashboardEnabled\":\"true\"}",
    },
    {
      "name"   = "nginx-ingress-controller",
      "config" = "{\"IngressSlbNetworkType\":\"internet\"}",
    },
    {
      "name"   = "arms-prometheus",
      "config" = "",
    },
    {
      "name"   = "ack-node-problem-detector",
      "config" = "{\"sls_project_name\":\"\"}",
    },
    {
      "name"   = "csi-plugin",
      "config" = "",
    },
    {
      "name"   = "csi-provisioner",
      "config" = "",
    }
  ]
}

# Specify the prefix of the name of the ACK managed cluster. 
variable "k8s_name_prefix" {
  description = "The name prefix used to create managed kubernetes cluster."
  default     = "tf-ack"
}

variable "vpc_name" {
  default = "tf-vpc"
}
variable "nodepool_name" {
  default = "default-nodepool"
}

# The default resource names. 
locals {
  k8s_name_terway = substr(join("-", [var.k8s_name_prefix, "terway"]), 0, 63)
}

# The VPC. 
resource "alicloud_vpc" "default" {
  vpc_name   = var.vpc_name
  cidr_block = "172.16.0.0/12"
}

# The node vSwitches. 
resource "alicloud_vswitch" "vswitches" {
  count      = length(var.node_vswitch_cidrs)
  vpc_id     = alicloud_vpc.default.id
  cidr_block = element(var.node_vswitch_cidrs, count.index)
  zone_id    = element(var.availability_zone, count.index)
}

# The pod vSwitches. 
resource "alicloud_vswitch" "terway_vswitches" {
  count      = length(var.terway_vswitch_cidrs)
  vpc_id     = alicloud_vpc.default.id
  cidr_block = element(var.terway_vswitch_cidrs, count.index)
  zone_id    = element(var.availability_zone, count.index)
}

# Create a deployment set.
resource "alicloud_ecs_deployment_set" "default" {
  strategy            = var.strategy
  domain              = "Default"
  granularity         = "Host"
  deployment_set_name = var.name
  description         = "example_value"
}

# The ACK managed cluster. 
resource "alicloud_cs_managed_kubernetes" "default" {
  name         = local.k8s_name_terway # The ACK cluster name. 
  cluster_spec = var.cluster_spec      # Create an ACK Pro cluster. 
  worker_vswitch_ids           = split(",", join(",", alicloud_vswitch.vswitches.*.id))        # The vSwitches used by the node pool. Specify one or more vSwitch IDs. The vSwitches must reside in the zone specified by availability_zone. 
  pod_vswitch_ids              = split(",", join(",", alicloud_vswitch.terway_vswitches.*.id)) # The vSwitch used by pods. 
  new_nat_gateway              = true                                                          # Specify whether to create a NAT gateway when the ACK cluster is created. Default value: true. 
  service_cidr                 = "10.11.0.0/16"                                                # The pod CIDR block. If you set the cluster_network_type parameter to flannel, this parameter is required. The pod CIDR block cannot be the same as the VPC CIDR block or the CIDR blocks of other ACK clusters in the VPC. You cannot change the pod CIDR block after the cluster is created. Maximum number of hosts in the cluster: 256. 
  slb_internet_enabled         = true                                                          # Specify whether to create an Internet-facing SLB instance for the API server of the cluster. Default value: false. 
  enable_rrsa                  = true
  control_plane_log_components = ["apiserver", "kcm", "scheduler", "ccm"] # The control plane logs. 

  dynamic "addons" { # Component management. 
    for_each = var.cluster_addons
    content {
      name   = lookup(addons.value, "name", var.cluster_addons)
      config = lookup(addons.value, "config", var.cluster_addons)
    }
  }
}

# The regular node pool. 
resource "alicloud_cs_kubernetes_node_pool" "default" {
  cluster_id           = alicloud_cs_managed_kubernetes.default.id              # The ACK cluster name. 
  node_pool_name       = var.nodepool_name                                      # The node pool name. 
  vswitch_ids          = split(",", join(",", alicloud_vswitch.vswitches.*.id)) # The vSwitches used by the node pool. Specify one or more vSwitch IDs. The vSwitches must reside in the zone specified by availability_zone. 
  instance_types       = var.worker_instance_types
  instance_charge_type = "PostPaid"
  runtime_name         = "containerd"
  desired_size          = 2            # The expected number of nodes in the node pool. 
  password              = var.password # The password that is used to log on to the cluster by using SSH. 
  install_cloud_monitor = true         # Specify whether to install the CloudMonitor agent on the nodes in the cluster. 
  system_disk_category  = "cloud_essd"
  system_disk_size      = 100
  image_type            = "AliyunLinux"
  deployment_set_id     = alicloud_ecs_deployment_set.default.id

  data_disks {              # The data disk configuration of the node. 
    category = "cloud_essd" # The disk category. 
    size     = 120          # The disk size. 
  }
}

If you want to view more complete examples, visit More examples and select the directory of the corresponding cloud service.

References