In a Container Service for Kubernetes (ACK) cluster that has the Terway network plug-in installed, you can use the NetworkPolicy feature to control communication among pods. When an ACK cluster that has Terway installed contains more than 100 nodes, the NetworkPolicy proxies cause heavy loads on the management of the cluster. To resolve this issue, you must optimize the NetworkPolicy feature for the cluster. This topic describes how to optimize the performance of the NetworkPolicy feature for a large ACK cluster in Terway mode.
Background information
Terway implements the NetworkPolicy feature by using the Felix agent of Calico. In an ACK cluster that contains more than 100 nodes, the Felix agent on each node retrieves proxy rules from the API server. This increases the loads of the API server. To reduce the loads of the API server, you can disable the NetworkPolicy feature or deploy the Typha component as a repeater.
You can improve the performance of the NetworkPolicy feature for a large ACK cluster in the following ways:
Deploy Typha as a repeater.
Disable the NetworkPolicy feature.
NoteAfter you disable the NetworkPolicy feature, you cannot use network policies to control communication among pods.
Prerequisites
An ACK cluster that has Terway installed and contains more than 100 nodes is created. For more information, see Create an ACK managed cluster.
The kubeconfig file of the cluster is obtained and a kubectl client is connected to the cluster.
Deploy Typha as a repeater
Log on to the ACK console.
Update Terway to the latest version. For more information, see Manage components.
Components used in different Terway modes are different. For more information, see Compare Terway modes.
Create a file named calico-typha.yaml and copy the following content to the file to deploy Typha as a repeater.
apiVersion: v1 kind: Service metadata: name: calico-typha namespace: kube-system labels: k8s-app: calico-typha spec: ports: - port: 5473 protocol: TCP targetPort: calico-typha name: calico-typha selector: k8s-app: calico-typha --- apiVersion: apps/v1 kind: Deployment metadata: name: calico-typha namespace: kube-system labels: k8s-app: calico-typha spec: replicas: 3 # Modify the value of the replicas parameter based on the cluster size. Create 1 replica for every 200 nodes. You must create at least three replicas. revisionHistoryLimit: 2 selector: matchLabels: k8s-app: calico-typha template: metadata: labels: k8s-app: calico-typha annotations: cluster-autoscaler.kubernetes.io/safe-to-evict: 'true' spec: nodeSelector: kubernetes.io/os: linux hostNetwork: true tolerations: - operator: Exists serviceAccountName: terway priorityClassName: system-cluster-critical containers: -image: registry-vpc.{REGION-ID}.aliyuncs.com/acs/typha:v3.20.2 # Replace {REGION-ID} with the region ID of the cluster. name: calico-typha ports: - containerPort: 5473 name: calico-typha protocol: TCP env: - name: TYPHA_LOGSEVERITYSCREEN value: "info" - name: TYPHA_LOGFILEPATH value: "none" - name: TYPHA_LOGSEVERITYSYS value: "none" - name: TYPHA_CONNECTIONREBALANCINGMODE value: "kubernetes" - name: TYPHA_DATASTORETYPE value: "kubernetes" - name: TYPHA_HEALTHENABLED value: "true" livenessProbe: httpGet: path: /liveness port: 9098 host: localhost periodSeconds: 30 initialDelaySeconds: 30 readinessProbe: httpGet: path: /readiness port: 9098 host: localhost periodSeconds: 10 --- apiVersion: policy/v1 # If the Kubernetes version of the cluster is earlier than 1.21, set the value of the apiVersion parameter to policy/v1beta1. kind: PodDisruptionBudget metadata: name: calico-typha namespace: kube-system labels: k8s-app: calico-typha spec: maxUnavailable: 1 selector: matchLabels: k8s-app: calico-typha --- apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition metadata: name: bgppeers.crd.projectcalico.org spec: scope: Cluster group: crd.projectcalico.org versions: - name: v1 served: true storage: true schema: openAPIV3Schema: type: object properties: apiVersion: type: string names: kind: BGPPeer plural: bgppeers singular: bgppeer
NoteReplace {REGION-ID} with the specified region ID.
Modify the value of the replicas parameter based on the cluster size. Create 1 replica for every 200 nodes. You must create at least three replicas.
Modify the value of the
apiVersion
parameter ofPodDisruptionBudget
based on the Kubernetes version of the cluster. If the Kubernetes version of the cluster is 1.21 or later, set the value of theapiVersion
parameter topolicy/v1
. If the Kubernetes version of the cluster is earlier than 1.21, set the value of theapiVersion
parameter topolicy/v1beta1
.
Run the following command to deploy Typha as a repeater:
kubectl apply -f calico-typha.yaml
Run the following command to modify the eni-config configuration file of the Terway plug-in:
kubectl edit cm eni-config -n kube-system
Add the
felix_relay_service: calico-typha
repeater configuration to the file and set the value of thedisable_network_policy
parameter to"false"
. If this parameter is unavailable, no configuration is required. The configuration of the two parameters must be aligned with theeni_conf
parameter.felix_relay_service: calico-typha disable_network_policy: "false" # If this parameter is unavailable, you do not need to add the setting.
Run the following command to restart Terway:
kubectl get pod -n kube-system | grep terway | awk '{print $1}' | xargs kubectl delete -n kube-system pod
Expected output:
pod "terway-eniip-8hmz7" deleted pod "terway-eniip-dclfn" deleted pod "terway-eniip-rmctm" deleted ...
Disable the NetworkPolicy feature
If you no longer need to use network policies, you can disable the NetworkPolicy feature to reduce the heavy load on the API server. The heavy load is caused by the NetworkPolicy proxies.
Run the following command to modify the eni-config configuration file of the Terway plug-in, and add the disable_network_policy: "true" setting to disable the NetworkPolicy feature.
kubectl edit cm -n kube-system eni-config #Add or modify (if this key exists) the following setting: disable_network_policy: "true"
Run the following command to restart Terway:
kubectl get pod -n kube-system | grep terway | awk '{print $1}' | xargs kubectl delete -n kube-system pod
Expected output:
pod "terway-eniip-8hmz7" deleted pod "terway-eniip-dclfn" deleted pod "terway-eniip-rmctm" deleted ...
Result
After the preceding operations are complete, the NetworkPolicy proxies start to use the Typha component. This reduces the loads on the API server. You can monitor the traffic that is distributed to the Server Load Balancer (SLB) instances to check whether the loads on the API server are reduced.