This topic describes the procedure for diagnosing LoadBalancer Services and how to troubleshoot errors.
Background information
When you create a Service whose type is Type=LoadBalancer
, the cloud controller manager (CCM) of Alibaba Cloud Container Compute Service (ACS) automatically creates or configures Server Load Balancer (SLB) resources for the Service, including an SLB instance, listeners, and backend server groups. For more information about the policies that are used to automatically update SLB resources, see Considerations for configuring a LoadBalancer Service.
Procedure
Make sure that the CCM version is 1.9.3.276-g372aa98-aliyun or later before troubleshooting. For more information about how to update the CCM, see Update the CCM. For more information about the release notes of the CCM, see Cloud Controller Manager.
Run the following command to query the Service that is associated with the SLB instance:
kubectl get svc -A |grep -i LoadBalancer|grep ${XXX.XXX.XXX.XXX} #XXX.XXX.XXX.XXX is the IP address of the SLB instance.
Run the following command to check whether events are generated for Service errors:
kubectl -n {your-namespace} describe svc {your-svc-name}
ImportantIf no events are generated for Service errors, check whether the CMM version is 1.9.3.276-g372aa98-aliyun or later. For more information about how to update the CCM, see Update the CCM.
If events are generated for Service errors, refer to Service errors and solutions.
If no events are generated for Service errors, refer to Troubleshooting.
If the errors persist, join the ACS DingTalk group for technical support.
Service errors and solutions
The following table describes how to fix the errors that occur in Services.
Error message | Description and solution |
| Shared-resource SLB instances do not support elastic network interfaces (ENIs). Solution: If you want to specify an ENI as a backend server, create a high-performance SLB instance. Add the Important Make sure that the annotations you add meet the requirements of the CCM version. For more information about the correlation between annotations and CCM versions, see Add annotations to the YAML file of a Service to configure CLB instances. |
| No backend server is associated with the SLB instance. Check whether pods are associated with the Service and whether the pods run as normal. Solutions:
|
| The system fails to associate a Service with the SLB instance. Solution: Log on to the SLB console and search for the SLB instance in the region of the Service based on
|
| Your account has overdue payments. |
| The account balance is insufficient. |
| API throttling is triggered for SLB. Solutions:
|
| The listener that is associated with the vServer group cannot be deleted. Solutions:
|
| The reused internal-facing SLB instance and the cluster are not deployed in the same virtual private cloud (VPC). Solution: Make sure that your SLB instance and the cluster are deployed in the same VPC. |
| The idle IP addresses in the vSwitch are insufficient. Solution: Use |
| The Solution: Set the |
| By default, earlier versions of CCM automatically create shared-resource SLB instances, which are no longer available for purchase. Solution: Update the CCM. |
| You cannot modify the resource group of an SLB instance after the resource group is created. Solution: Delete the |
| The specified IP address of the ENI cannot be found in the VPC. Solution: Check whether the |
| You cannot change the billing method of the SLB instance used by a Service from pay-as-you-go to pay-by-specification. Solutions:
|
| The SLB instance created by the CCM is reused. Solutions:
|
| You cannot change the type of an SLB instance after it is created. Solution: Recreate the related Service. |
| You cannot associate an SLB instance with a Service that is already assocaited with another SLB instance. Solution: You cannot reuse an existing SLB instance by modifying the value of the |
Troubleshooting
You can refer to the information provided in the following table to troubleshoot errors other than Service errors.
Category | Issue | Solution |
Issues that occur when you access an SLB instance | The SLB instance does not evenly distribute traffic. | |
The 503 error occurs when I access the SLB instance during application updates. | The 503 error occurs when I access the SLB instance during application updates | |
The SLB instance cannot be accessed from within the cluster. | ||
The SLB instance cannot be accessed from outside the cluster. | The SLB instance cannot be accessed from outside the cluster | |
The | ||
Issues related to SLB configurations | The annotations of the Service do not take effect. | What do I do if the annotations of a Service do not take effect? |
The configuration of the SLB instance is modified. | ||
The system fails to reuse an existing SLB instance. | Why does the system fail to use an existing SLB instance for more than one Services? | |
No listener is created when an existing SLB instance is reused. | Why is no listener created when I reuse an existing SLB instance? | |
The endpoint of the Service is different from that specified for the backend server of the SLB instance. | What do I do if the vServer groups of an SLB instance are not updated? | |
Issues related to SLB deletion | The SLB instance is deleted. | |
The SLB instance is not deleted together with the Service. |
The SLB instance does not evenly distribute traffic
Cause
The scheduling algorithm specified for the SLB instance is improper.
Issue
Traffic is not evenly distributed to the backend servers of an SLB instance.
Solution
If long-lived connections are established to your Service, set the scheduling algorithm of the SLB instance to Weighted Least Connections (WLC) by adding the
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-scheduler:"wlc"
annotation.
The 503 error occurs when I access the SLB instance during application updates
Cause
Connection draining is not configured for the SLB listener or graceful shutdown is not configured for the pod.
Issue
The 503 error occurs when you access the SLB instance during application updates.
Solution
Add the
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain
annotation to configure connection training for the SLB listener. For more information about the annotation, see Common operations to manage listeners.Set the
preStop
andreadinessProbe
parameters for the pod based on the network mode of the pod.readinessProbe
checks whether the container is ready to accept network traffic. The pod is added to the endpoint only if the pod passes the readiness probing. The pod is attached to the SLB instance only if ACS identifies that the endpoint is updated. You must set a proper probing interval, delay period, and unhealthy threshold forreadinessProbe
because some applications may require a long time period to start. If you specify a short time period, the application pods repeatedly restart.We recommend that you set the value of
preStop
to a time period that the application pods require to handle the remaining requests. We recommend that you set the value ofterminationGracePeriodSeconds
to a time period that is 30 seconds longer thanpreStop
.
Pod configuration example:
apiVersion: v1 kind: Pod metadata: name: nginx namespace: default spec: containers: - name: nginx image: nginx # Liveness probing livenessProbe: failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 30 successThreshold: 1 tcpSocket: port: 5084 timeoutSeconds: 1 # Readiness probing readinessProbe: failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 30 successThreshold: 1 tcpSocket: port: 5084 timeoutSeconds: 1 # Graceful shutdown lifecycle: preStop: exec: command: - sleep - 30 terminationGracePeriodSeconds: 60
The SLB instance cannot be accessed from outside the cluster
Cause
You configured access control list (ACL) rules for the SLB instance or the SLB instance does not run as expected.
Issue
You cannot access the SLB instance from outside the cluster.
Solution
Run the following command to query Service events and troubleshoot errors. For more information, see Service errors and solutions.
kubectl -n {your-namespace} describe svc {your-svc-name}
Check whether ACL rules are configured for the SLB instance.
If ACL rules are configured for the SLB instance, check whether the client IP address is allowed to access the SLB instance. For more information about how to configure ACL rules for an SLB instance, see Access control.
Check whether the SLB instance is associated with a vServer group.
If no vServer group is associated, check whether the application pods are associated with the Service and whether the application pods run as normal. If the application pods do not run as normal, identify the causes and troubleshoot the errors. For more information, see Pod troubleshooting.
Check whether unhealthy backend servers are detected by the SLB listeners.
If unhealthy backend servers are detected, check whether the application pods run as normal. For more information about health checks for SLB, see Execute a health check script.
If the issues persist, join the ACS DingTalk group for technical support.
Backend HTTPS services cannot be accessed
Cause
After you specify the certificate information in the SLB instance, the SLB instance decrypts HTTPS requests and then sends HTTP requests to the backend pods.
Issue
You cannot access backend HTTPS services.
Solution
Set targetPort to an HTTP port in the Service. For example, the HTTPS port is 443 in the following NGINX Service. In this case, you must change the value of targetPort
to 80
.
Examples:
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-protocol-port: "https:443"
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-cert-id: "${YOUR_CERT_ID}"
name: nginx
namespace: default
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
- port: 443
protocol: TCP
targetPort: 80
selector:
run: nginx
type: LoadBalancer