×
Community Blog Analysis of Alibaba Cloud Container Network Data Link (2): Terway EN

Analysis of Alibaba Cloud Container Network Data Link (2): Terway EN

Part 2 of this series focuses on ACK data link forwarding paths in different SOP scenarios in Terway ENI mode.

By Yu Kai

Co-Author: Xieshi (Alibaba Cloud Container Service)

This article is the second part of the series, which mainly introduces the forwarding links of data plane links in Kubernetes Terway ENI modes. First, by understanding the forwarding links of the data plane in different scenarios, it can detect the reasons for the performance of customer access results in different scenarios and help customers further optimize the business architecture. On the other hand, by understanding the forwarding links in-depth when encountering container network jitter, customer O&M and Alibaba Cloud developers can know which link points to deploy and observe manually, to further delimit the direction and cause of the problem.

Previous article: Analysis of Alibaba Cloud Container Network Data Link (1): Flannel

Terway ENI Mode Architecture Design

In Terway ENI mode, the network of ENI is the same as that of VPC. ENI network is to create and bind an ENI[1] from Aliyun's VPC network to ECS nodes, and then pods use this ENI to communicate with other networks. It should be noted that the number of ENI is limited, and quota [2] is specific according to the type of instance.

[1] Elastic Network Interface
https://www.alibabacloud.com/help/en/elastic-compute-service/latest/elastic-network-interfaces-overview

[2] Instance types have different quotas
https://www.alibabacloud.com/help/en/elastic-compute-service/latest/instance-family

1

The CIDR block used by the Pod is the same as the CIDR block of the node.

2

It can be seen that there are two network interface controllers inside the Pod: eth0 and veth1. The IP of eth0 is the IP of the Pod. The MAC address of this network interface controller can match the MAC address of the ENI on the console, indicating that this network interface controller is the secondary ENI network interface controller and is mounted to the network namespace of the Pod.

3
4

There is a default route pointing to eth0 in the Pod, and a route pointing to the destination CIDR block of 192.168.0.0/16 and the next hop of veth1 network interface controller, of which 192.168.0.0/16 is the service CIDR block of the cluster. It indicates that the Pod in the cluster accesses the clusterIP CIDR block of SVC, and the data link will through the veth1 network interface controller to the OS of the host ECS for the next judgment. In other cases, go to VPC directly through eth0.

5

As shown in the figure, we can see veth1@if19 through ip addr in the network namespace of the container, where '19' will help us find the opposite of veth pair in the network namespace of the container in the OS of ECS. In the ECS OS, we can find the virtual network interface controller cali38ef34581a9 through ip addr | grep 19:, which is the opposite of the veth pair on the ECS OS side.

6

So far, when a container accesses ClusterIP of SVC, a link has been established data link the container and OS. How does ECS OS determine which container to go to for data traffic? From the OS Linux Routing, we can see that all traffic destined for the CIDR block of the pod will be forwarded to the calico virtual card corresponding to the pod. Up to this point, the ECS OS and the network namespace of the pod have established a complete ingress and egress link configuration.

7

Analysis of Container Network Data Link in Terway ENI Mode

Based on the characteristics of container networks, the network links in Terway ENI mode can be roughly divided into two major SOP scenarios: pod IP and SVC, which can be subdivided into eight different small SOP scenarios.

8

The data link of these eight scenarios can be summarized into the following eight typical scenarios.

Under the TerwayENI architecture, different data link access scenarios can be summarized into eight categories.

  • Access Pod IP and access the pod on the same node.
  • Access Pod IP and mutually access between pods on the same node.
  • Access Pod IP and mutually access between pods on different nodes.
  • Access SVC IP (Cluster IP) in the cluster. The source and SVC backend pods are on the same node.
  • Access the SVC IP (Cluster IP) in the cluster. The source and SVC backend pods are different nodes.
  • Access the SVC IP (External IP) in the cluster. The source and SVC backend pods are on the same node.
  • Access the SVC IP(External IP) in the cluster, and the source and SVC backend pods are different nodes.
  • Access SVC External IP outside the cluster.

Scenario 1: Access Pod IP and Access the Pod on the Same Node

Environment

9

nginx1-5969d8fc89-9t99h and 10.0.0.203 exist on ap-southeast-1.10.0.0.196 node.

Kernel Routing

The nginx1-5969d8fc89-9t99h IP address is 10.0.0.203, and the PID of the container on the host is 1094736. The container network namespace has a default route pointing to container eth0 and the two routes with the next hop of veth1 and the destination CIDR block of service.

10
11
12

The veth pair of container veth1 in ECS OS is cali5068e632525.

13
14

In ECS OS, there is a route that points to Pod IP and the next hop is calixxxxx. As you can see from the preceding section, the calixxx network interface controller is a pair that is composed of veth1 in each pod. Therefore, accessing the CIDR of SVC in the pod will point to veth1 instead of the default eth0 route. Therefore, the main functions of the calixx network interface controller here are:

  1. Nodes access pods
  2. When the node or pod accesses SVC CIDR, it will go through the ECS OS kernel protocol stack conversion to calixxx and veth1 to access the pod.

15

Summary: Destination Can Be Accessed

nginx1-5969d8fc89-9t99h netns veth1 can catch packets.

16

nginx1-5969d8fc89-9t99h cali5068e632525 can catch packets.

17

18
Data Link Forwarding Diagram

  • The data link is ECS → Linux routing → calicxxx → Pod net ns veth1. The data link completes host ns switching over to pod ns.
  • Switch to the veth pair of the pod by the route on the host.
  • The network interface controller is exclusive to the assigned pod and cannot be shared with other pods.
  • Data link goes through the protocol stack twice.

Scenario 2: Access Pod IP and Access Pod on the Same Node

Environment

19

Two pods exist on the ap-southeast-1.10.0.0.196 node: centos-59cdc5c9c4-89f8x, IP addresses 10.0.0.202 and nginx1-5969d8fc89-9t99h and 10.0.0.203.

Kernel Routing

The centos-59cdc5c9c4-89f8x IP address is 10.0.0.202, and the PID of the container on the host is 2314075. The container network namespace has a default route pointing to container eth0 and the two routes with the next hop of veth1 and the destination CIDR block of service.

20
21
22

Through a similar method, the nginx1-5969d8fc89-9t99h IP address 10.0.0.203 can be found, and the PID of the container on the host is 1094736.

23

Summary: Destination Can Be Accessed

centos-59cdc5c9c4-89f8x netns eth1 can catch packets.

24

nginx1-5969d8fc89-9t99h netns eth1 can catch packets.

25

26
Data Link Forwarding Diagram

  • The data link is pod1 netns eth1 → VPC → pod2 netns eth2. Data link is not passed through the host namespace. It will first go out of ECS, go to VPC, and then return to the original ECS.
  • In the net namespace in the pod, there is a default routing rule for a direct hit. Exit from the eth0 network interface controller the pod and enter VPC. The eth0 here is the subsidiary ENI, which is mounted on the net ns of the pod and goes through the PCI network interface controller.
  • The network interface controller is exclusive to the assigned pod and cannot be shared with other pods.
  • Data link goes through the protocol stack twice.

Scenario 3: Access Pod IP and Access the Pod from a Different Node

Environment

27

The pod exists on the ap-southeast-1.10.0.0.196 node: centos-59cdc5c9c4-89f8x, IP address 10.0.0.202.

The pod exists on ap-southeast-1.10.0.2.80 nodes: nginx-6f545cb57c-jmbrq and 10.0.2.86.

Kernel Routing

The centos-59cdc5c9c4-89f8x IP address is 10.0.0.202, and the PID of the container on the host is 2314075. The container network namespace has a default route pointing to container eth0 and the two routes with the next hop of veth1 and the destination CIDR block of service.

28
29
30

Through a similar method, the nginx-6f545cb57c-jmbrq IP address 10.0.2.86 can be found, and the PID of the container on the host is 1083623.

31
32

Summary: Destination Can be Accessed

centos-59cdc5c9c4-89f8x netns eth0 can catch packets.

33

nginx-6f545cb57c-jmbrq netns eth0 can catch packets.

34

35
Data Link Forwarding Diagram

  • Data link is ECS1 pod1 netns eth0 → VPC → ECS2 pod2 netns eth0. Data link does not pass the host namespace. The data link will go out of ECS1, go to AVS, and then return to ECS2.
  • In the net namespace in the pod, there is a default routing rule for a direct hit. Exit from the eth0 network interface controller the pod and enter VPC. The eth0 here is the subsidiary ENI, which is directly mounted on the net ns of the pod and uses PCI equipment.
  • The network interface controller is exclusive to the assigned pod and cannot be shared with other pods.
  • Data link goes through the protocol stack twice.

Scenario 4: Access SVC IP (Cluster IP) in a Cluster. The Source and SVC Backend Pods Are on the Same Nod.

Environment

36
37

Two pods exist on the ap-southeast-1.10.0.0.196 node: centos-59cdc5c9c4-89f8x, IP addresses 10.0.0.202 and nginx1-5969d8fc89-9t99h and 10.0.0.203.

The clusterIP of the service is 192.168.41.244, and the external IP is 8.219.175.179 in the nginx1 cluster.

Kernel Routing

The centos-59cdc5c9c4-89f8x IP address is 10.0.0.202, and the PID of the container on the host is 2314075. The container network namespace has a default route pointing to container eth0 and the two routes with the next hop of veth1 and the destination CIDR block of service.

38
39

The container eth0 corresponds to the veth pair in the ECS OS: cali38ef34581a9.

40
41

Through a similar method, the nginx1-5969d8fc89-9t99h IP address 10.0.0.203 can be found, the PID of the container on the host is 1094736, and the corresponding veth pair of the container eth0 in ECS OS is cali5068e632525.

42

In ECS OS, there is a route that points to Pod IP and the next hop is calixxxxx. As you can see from the preceding section, the calixxx network interface controller is a pair composed of veth1 in each pod. Therefore, accessing the CIDR of SVC in the pod will point to veth1 instead of the default eth0 route. Therefore, the main functions of the calixx network interface controller here are:

  1. Nodes access pods
  2. When the node or pod accesses SVC CIDR, it will go through the ECS OS kernel protocol stack conversion to calixxx and veth1 to access the pod.

43

Summary: Destination Can Be Accessed

centos-59cdc5c9c4-89f8x netns veth1 can catch packets.

44

centos-59cdc5c9c4-89f8x netns cali38ef34581a9 can catch packets.

45

nginx1-5969d8fc89-9t99h netns veth1 can catch packets.

46

nginx1-5969d8fc89-9t99h netns cali5068e632525 can catch packets.

47

48
Data Link Forwarding Diagram

  • Data Link: ECS1 pod1 netns veth1 → calixx1 → calixxxx2 → ECS2 pod2 netns veth1
  • In the net namespace in the pod, the route that hits the SVC is exiting the pod from the veth1 network interface controller to the ECS namespace and then is transferred to the calixx network interface controller of another pod through Linux routing. Here veth1 and calixxx are veth pair.
  • The calicocxx network interface controller can capture the SVC IP and source pod IP to the veth allocated by the pod. The SVC IP address hits the ipvs/iptables rule in the source ECS host and is converted to nat.
  • Veth is allocated by the destination segment pod. The calicosxx network interface controller can capture the default IP of the calicosxx network interface controller and the destination pod IP.
  • The network interface controller is exclusive to the assigned pod and cannot be shared with other pods.
  • Data link goes through three protocol stacks: Pod1, ECS OS, and Pod2.

Scenario 5: Access the SVC IP (Cluster IP) in a Cluster, and the Source and SVC Backend Pods Are Different Nodes

Environment

49
50

The pod exists on the ap-southeast-1.10.0.0.196 node: centos-59cdc5c9c4-89f8x, IP address 10.0.0.202.

The pod exists on ap-southeast-1.10.0.2.80 nodes: nginx-6f545cb57c-jmbrq and 10.0.2.86.

The clusterIP address of the service is 192.168.204.233, and the external IP is 8.219.199.33.

Kernel Routing

The centos-59cdc5c9c4-89f8x IP address is 10.0.0.202, and the PID of the container on the host is 2314075. The container network namespace has a default route pointing to container eth0 and the two routes with the next hop of veth1 and the destination CIDR block of service.

51
52

The container eth0 corresponds to the veth pair in the ECS OS is cali38ef34581a9.

53
54

You can use the preceding method to find the nginx-6f545cb57c-jmbrq. The IP address is 10.0.2.86, the PID of the container on the host is 1083623, and the pod ENI is directly mounted to the network namespace of the pod.

55
56

Summary: Destination Can Be Accessed

57
Data Link Forwarding Diagram

  • Data Link is ECS1 pod1 netns veth1 → cali38ef34581a9 → ECS1 eth0 → VPC → ECS2 pod2 netns veth1.
  • In the net namespace of the client pod, the route that hits the SVC is the network interface controller from the veth1 to the namespace of the pod to the ECS. Then, it is transferred to the eth0 network interface controller of the client ECS through Linux routing, and it is forwarded to the eth network interface controller to which the destination pod belongs.
  • The calicocxx network interface controller can capture the SVC IP and the source pod IP to the veth allocated by the pod.
  • The SVC IP address hits the ipvs/iptables rule in the source ECS host and is converted to Fnat. The eth0 of the source ECS can only capture the IP address of the source ECS and the IP address of the source ECS instance assigned by the ipvs/iptables rule.
  • The IP addresses captured by eth0 in the destination pod are the source ECS IP address and the destination pod IP address. (Source POD IP and SVC IP are not reflected.)
  • The network interface controller is exclusive to the assigned pod and cannot be shared with other pods.
  • Data link goes through three protocol stacks: Pod1, ECS1 OS, and Pod2.

Scenario 6: Access an SVC IP (External IP) in a Cluster. The Source and SVC Backend Pods Are on the Same Node.

Environment

58
59

Two pods exist on the ap-southeast-1.10.0.0.196 node: centos-59cdc5c9c4-89f8x IP addresses 10.0.0.202 and nginx1-5969d8fc89-9t99h and 10.0.0.203.

The clusterIP address of the Service is 192.168.221.163, and the external IP address is 10.0.2.89 in the nginx1 cluster.

Kernel Routing

The centos-59cdc5c9c4-89f8x IP address is 10.0.0.202, and the PID of the container on the host is 2314075. The container network namespace has a default route pointing to container eth0 and the two routes with the next hop of veth1 and the destination CIDR block of service ClusterIP.

60
61
62

SLB-Related Configurations

In the SLB console, you can see the backend of the virtual server group only has nginx1-5969d8fc89-9t99h ENI eni-t4n6qvabpwi24w0dcy55.

63

In summary, it can be judged that if the external IP address of SVC is accessed, the default route eth0 is used, the ECS is directly accessed to avs, the SLB instance is accessed, and then the SLB instance is forwarded to the backend eni.

Summary: Destination Can Be Accessed

64
Data Link Forwarding Diagram

  • Data Link is ECS1 pod1 netns eth0 → VPC → SLB → VPC → ECS1 pod2 netns eth0. Data link is not passed through the host namespace. It will first go out of ECS1, go to AVS, and then return to ECS1.
  • In the net namespace in the pod, there is a default routing rule for a direct hit. Exit from the eth0 network interface controller the pod and enter VPC. The eth0 here is the subsidiary ENI, which is mounted on the net ns of the pod and goes PCI equipment.
  • The network interface controller is exclusive to the assigned pod and cannot be shared with other pods.
  • It is different from scenario 2.4. Although both the front and back pods are deployed in the same ECS to access the IP of SVC, it can be seen that if the ClusterIP of SVC is accessed, the data link will enter the ECS OS level and pass through the protocol stack three times. If the External IP is accessed, the ECS will not be accessed through ECS OS and will be forwarded to the destination Pod through SLB. It only goes through the stack twice (Pod1 and Pod2).

Scenario 7: Access an SVC IP (External IP) in a Cluster. The Source and SVC Backend Pods Are on Different Nodes.

Environment

65
66

The pod exists on the ap-southeast-1.10.0.0.196 node: centos-59cdc5c9c4-89f8x, IP address 10.0.0.202.

The pod exists on ap-southeast-1.10.0.2.80 nodes: nginx-6f545cb57c-jmbrq and 10.0.2.86.

The clusterIP of the Service is 192.168.254.141, and the external IP is 10.0.2.90.

Kernel Routing

The centos-59cdc5c9c4-89f8x IP address is 10.0.0.202, and the PID of the container on the host is 2314075. The container network namespace has a default route pointing to container eth0 and the two routes with the next hop of veth1 and the destination CIDR block of service ClusterIP.

67
68
69

SLB-Related Configurations

In the SLB console, you can see that the backend of the lb-t4nih6p8w8b1dc7p587j9 virtual server group only has nginx-6f545cb57c-jmbrq ENI eni-t4n5kzo553dfak2sp68j.

70

To sum up, it can be judged that if the external IP address of SVC is accessed, the default route eth0 is used, the ECS is accessed to avs, the SLB instance is accessed, and then the SLB instance is forwarded to the backend eni.

Summary: Destination Can Be Accessed

71
Data Link Forwarding Diagram

  • Data Link is ECS1 pod1 netns eth0 → VPC → SLB → VPC → ECS2 pod2 netns eth0. Data Link is not passed through the host namespace. The data link will first go out of ECS1, go to SLB, and then return to ECS2.
  • In the net namespace in the pod, there is a default routing rule for a direct hit. Exit from the eth0 network interface controller the pod and enter VPC. The eth0 here is the subsidiary ENI, which is mounted on the net ns of the pod and goes PCI equipment.
  • The network interface controller is exclusive to the assigned pod and cannot be shared with other pods.
  • It is different from scenario 2.5. Although both the front and back pods are deployed in different ECS to access SVC IP, if the ClusterIP of SVC is accessed, the data link will enter the ECS OS level, exit ECS through eth0 of ECS, and enter AVS through three protocol stacks. If the access is External IP, it will not pass through ECS OS. The pod is routed out of the ECS instance through the secondary ENI to which the pod belongs and then forwarded to the destination pod through the SLB instance. Only two protocol stacks (Pod1 and Pod2) are passed.

Scenario 8: Access SVC External IP outside the Cluster

Environment

72
73

The pod exists on ap-southeast-1.10.0.2.80 nodes: nginx-6f545cb57c-jmbrq and 10.0.2.86.

The pod exists on ap-southeast-1.10.0.1.233 nodes: nginx-6f545cb57c-25k9z and 10.0.1.239.

The clusterIP address of the Service is 192.168.254.141 and the external IP address is 10.0.2.90.

SLB-Related Configurations

In the SLB console, you can see that the backend server group of the lb-t4nih6p8w8b1dc7p587j9 virtual server group is the ENI eni-t4n5kzo553dfak2sp68j and eni-t4naaozjxiehvmg2lwfo of the two backend nginx pods.

74

From the perspective of the outside of the cluster, the backend virtual server group of SLB is the network interface controller of the two ENIs to which the backend Pod of SVC belongs. The IP address of the internal network is the address of the Pod and directly enters the protocol stack of the OS without passing through the OS level of the ECS where the backend Pod is located.

Summary: Destination Can Be Accessed

75
Data Link Forwarding Diagram

  • Data Link: client → SLB → Pod ENI + Pod Port → ECS1 Pod1 eth0
  • Data link goes through the Pod1 protocol stack.

Summary

This article focuses on ACK data link forwarding paths in different SOP scenarios in Terway ENI mode. With the customer's demand for extreme performance, Terway ENI can be divided into eight SOP scenarios. The forwarding links, technical implementation principles, and cloud product configurations of these eight scenarios are sorted out and summarized. This provides preliminary guidance to encounter link jitter, optimal configuration, and link principles under the Terway ENI architecture. In Terway ENI mode, ENI is mounted to the namespace of pods in PCI mode, which means that ENI belongs to the allocated pods. The number of pods that ECS can deploy depends on the limit of the number of ENI network interface controllers ECS can mount. This limit is related to the instance type of ECS. For example, ECS ecs.ebmg7.32xlarge,128C 512GB only supports a maximum of 32 ENIs. This often results in a waste of resources and a reduction in deployment density. In order to solve this resource efficiency problem, ACK provides the Terway ENIIP method to realize that ENI is shared by multiple pods. This increases the quota of pods on a single ECS instance and improves the deployment density. This is currently the most widely used architecture for online clusters. The next part of this series will enter the analysis of the Terway ENIIP mode - ACK Analysis of Alibaba Cloud Container Network Data Link (3): Terway ENIIP.

0 1 0
Share on

Alibaba Cloud Native

208 posts | 12 followers

You may also like

Comments

Alibaba Cloud Native

208 posts | 12 followers

Related Products