Recommended configurations for workloads in ACK clusters - Container Service for Kubernetes

When you configure workloads (Deployments, StatefulSets, DaemonSets, Jobs, and CronJobs) in a Container Service for Kubernetes (ACK) cluster, you must consider multiple factors to ensure that applications can run stably and reliably.

Claim resources (request and limit) for each pod

In an ACK cluster, excessive pods may be scheduled to one node. This overloads the node and the node can no longer provide services.

To avoid this issue, you can specify resource requests and resource limits for a pod when you deploy the pod in your cluster. This ensures that the pod is deployed on a node with sufficient idle resources. In the following example, the NGINX pod requires 1 vCPU and 1,024 MB of memory. When the pod is running, the upper limit of resource usage is 2 vCPUs and 4,096 MB of memory.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    resources: # Resource claim.
      requests:
        memory: "1024Mi"
        cpu: "1000m"
      limits:
        memory: "4096Mi"
        cpu: "2000m"

ACK uses a static resource scheduling method and calculates the remaining resources on each node by using the following formula: Remaining resources = Total resources on the node - Allocated resources. The allocated resources are not equivalent to the resources that are used. When you manually run a resource-consuming program, ACK is unaware of the resources used by the program.

You must claim resources for all pods. For a pod that does not have resource claims, after it is scheduled to a node, the resources used by the pod are not deducted from the total resources of the node. As a result, excessive pods may be scheduled to the node.

You can use the resource profiling feature provided by ACK to get resource configuration suggestions for containers based on the historical data of resource usage. This greatly simplifies the configuration of resource requests and limits for containers. For more information, see Resource profiling.

Wait until dependencies are ready instead of terminating an application during startup

Some applications may have external dependencies. For example, an application may need to read data from a database or call the interface of another service. However, when the application starts up, the database or the interface may be unready. During manual O&M, the application is terminated when the external dependencies are unready. This is known as failfast. This strategy is not suitable for ACK clusters. Most O&M activities in ACK are automated. For example, you do not need to manually deploy an application, start the application on a selected node, or restart the application when the application fails. Applications in ACK clusters are automatically restarted upon failures. You can also scale the number of pods by using the Horizontal Pod Autoscaler (HPA) to deal with increased loads.

For example, Application A is reliant on Application B and both applications run on the same node. The node restarts due to some reason. After the node is restarted, Application A is started while Application B is not. In this case, the dependency of Application A is unready. The traditional way terminates Application A in this case. After Application B is started, Application A must be manually started.

The best way for ACK is to check whether the dependencies are ready in a round robin manner and wait until all dependencies are ready instead of terminating the application. To do this, use Init Container.

Configure the restart policy

It is common for application processes running in a pod to be closed. Application processes may be closed due to bugs in the code, excessive memory use, or other reasons. The pod fails when the processes are closed. You can set the restartPolicy parameter for the pod. This ensures that the pod can be automatically restarted upon failures.

apiVersion: v1
kind: Pod
metadata:
  name: tomcat
spec:
  containers:
  - name: tomcat
    image: tomcat
    restartPolicy: OnFailure

Valid values of the restartPolicy parameter:

Always: automatically restarts the pod in all cases.
OnFailure: automatically restarts the pod upon failures (the state of the closed process is not 0).
Never: never restarts the pod.

Configure liveness probes and readiness probes

A pod may be unable to provide services even if the pod is in the Running state. The processes in a running pod may be locked. Consequently, the pod cannot provide services. However, Kubernetes does not restart such a pod because the pod is still running. Therefore, you must configure liveness probes for all pods in a cluster. The probes check whether the pods are alive and can provide services. When a liveness probe detects exceptions in a pod, the pod is automatically restarted.

A readiness probe is used to determine whether a pod is ready to provide external services. It takes some time for an application to initialize during startup. During the initialization, the pod where the application runs cannot provide external services. A readiness probe is used to inform Ingresses or Services whether the pod is ready to receive network traffic. When pod errors are detected by readiness probes, Kubernetes stops forwarding network traffic to the pod.

apiVersion: v1
kind: Pod
metadata:
  name: tomcat
spec:
  containers:
  - name: tomcat
    image: tomcat
    livenessProbe:
      httpGet:
        path: /index.jsp
        port: 8080
      initialDelaySeconds: 3
      periodSeconds: 3
    readinessProbe:
      httpGet:
        path: /index.jsp
        port: 8080

Run only one process in each container

Users who are new to the container service usually use containers as VMs and run multiple processes in one container. The processes include the monitoring process, logging process, sshd process, and systemd. This causes the following two issues:

It becomes complex to determine the resource usage of a pod. Implementing resource requests and limits also becomes difficult.
Running only one process in each container ensures that the container engine can detect process failures and restart the container upon each failure. If a container contains multiple processes, the container may not be affected when one of the processes is terminated. The external container engine is unaware of the terminated process and therefore does not perform any action on the container. However, the container may not be able to function as expected in this case.

ACK allows you to run multiple processes collaboratively. For example, if you want NGINX and php-fpm to communicate with each other by using a UNIX domain socket, you can use a pod that contains two containers. Then, store the UNIX domain socket in a volume shared by the two containers.

Eliminate single points of failure (SPOF)

If an application uses only one Elastic Compute Service (ECS) instance, the application becomes unavailable during the time period in which the ECS instance is restarted upon a failure. The application also becomes unavailable when it is upgraded or when a new version is released. Therefore, we recommend that you do not directly run applications in pods. Instead, deploy applications by using Deployments or StatefulSets and use more than two pods for each application.

References

ACK allows you to perform canary releases and blue-green releases for applications. For more information, see Application deployment.

For more information about the best practices for application management, see Best practices for application management.

For more information about how to troubleshoot the errors in running application pods, see Pod troubleshooting and FAQ about applications.