Network is always a topic related to container. The container community developed two models for the network: Container Network Interface (CNI) and Container Network Model (CNM). In my opinion, CNI is mainly used in the container scheduling field, working with the scheduling layer (such as kubernetes) to complete the container network initialization. CNM is primarily promoted by Docker and mainly used to build container networks for standalones.
PouchContainer supports both models. CNI works with Container Runtime Interface (CRI) to offer computing and network services to kubernetes, while CNM offers network services to the container when Pouch is independently used. Currently, PouchContainer implements the container's CNM network using Libnetwork. This article will introduce the definition of CNM, describe how the PouchContainer uses the Libnetwork (this part includes many source code), and make a conclusion on network building by PouchContainer.
CNM is mainly promoted by Docker. The CNM model consists of the following types of resources:
Libnetwork fully complies with CNM and provides network devices with various drivers such as bridge, host, IP, IP VLAN, MAC VLAN, null, and overlay.
The PouchContainer invokes Libnetwork by the following steps:
The first step is the initialization of Libnetwork Controller. This step is included in the NetworkManager initialization. NetworkManager is the network module of PouchController, and is the entrance of all network operations.
func NewNetworkManager(cfg *config.Config, store *meta.Store, ctrMgr ContainerMgr) (*NetworkManager, error) {
...
ctlOptions, err := controllerOptions(cfg.NetworkConfig)
if err != nil {
return nil, errors.Wrap(err, "failed to build network options")
}
controller, err := libnetwork.New(ctlOptions...)
if err != nil {
return nil, errors.Wrap(err, "failed to create network controller")
}
...
}
The second step is network initialization, namely the building of the networks such as bridge and null. This step is also included in the Pouchd booting stage.
func NetworkModeInit(ctx context.Context, config network.Config, manager mgr.NetworkMgr) error {
// if it has old containers, don't to intialize network.
if len(config.ActiveSandboxes) > 0 {
logrus.Warnf("There are old containers, don't to initialize network")
return nil
}
// init none network
if n, _ := manager.Get(ctx, "none"); n == nil {
...
}
// init host network
if n, _ := manager.Get(ctx, "host"); n == nil {
...
}
// init bridge network
return bridge.New(ctx, config.BridgeConfig, manager)
}
The last step is sandbox and endpoints creation in the booting of container. This step is included in the container booting stage. The start function prepareContainerNetwork will be used to build the network.
func (mgr *ContainerManager) start(ctx context.Context, c *Container, detachKeys string) error {
...
if err = mgr.prepareContainerNetwork(ctx, c); err != nil {
return err
}
if err = mgr.createContainerdContainer(ctx, c); err != nil {
return errors.Wrapf(err, "failed to create container(%s) on containerd", c.ID)
}
return nil
}
The prepareContainerNetwork function will call the EndpointCreate function of NetworkManager to create and join sandbox and endpoints.
func (nm *NetworkManager) EndpointCreate(ctx context.Context, endpoint *types.Endpoint) (string, error) {
containerID := endpoint.Owner
network := endpoint.Name
...
n, err := nm.controller.NetworkByName(network)
...
// create endpoint
ep, err := n.CreateEndpoint(endpointName, epOptions...)
if err != nil {
return "", err
}
// create sandbox
sb := nm.getNetworkSandbox(containerID)
if sb == nil {
...
sb, err = nm.controller.NewSandbox(containerID, sandboxOptions...)
}
...
// endpoint joins into sandbox
if err := ep.Join(sb, joinOptions...); err != nil {
return "", fmt.Errorf("failed to join sandbox(%v)", err)
}
...
}
It seems that the preceding procedure is completed seamlessly. However, if you review the start functions of ContainerManager, you will find that the sandbox and endpoints are created first, and then the createContainerdContainer function is called to create the real container. The carrier of sandbox, namely, the network namespace, is created during the container booting. Is there a conflict?
The bottom layer of container depends on the cgroup and namespace technologies, in which cgroup is used to restrict and calculate the usage of container resources (such as CPU, memory, and I/O) and namespace is used to isolate these resources.
By default, the bottom-layer container of PouchContainer is run by runC. Therefore, cgroup and namespace are implemented by runC. The creation of namespace is completed by a section of C code in runC.
nsexec creates the namespace by two forks and one unshare, and then executes the entrypoint or command entered by user.
To learn about the relationship between sandbox and network namespace, you must know the complete structure of sandbox.
type sandbox struct {
id string
containerID string
config containerConfig
extDNS []string
osSbox osl.Sandbox
controller *controller
resolver Resolver
resolverOnce sync.Once
refCnt int
endpoints epHeap
epPriority map[string]int
populatedEndpoints map[string]struct{}
joinLeaveDone chan struct{}
dbIndex uint64
dbExists bool
isStub bool
inDelete bool
ingress bool
sync.Mutex
}
Sandbox includes an osSbox member. The implementations of osSbox vary according to the operating systems. In Linux, osSbox is implemented by network namespace.
The following is the creation procedure of sandbox.
func (c *controller) NewSandbox(containerID string, options ...SandboxOption) (Sandbox, error) {
if sb.config.useDefaultSandBox {
c.sboxOnce.Do(func() {
c.defOsSbox, err = osl.NewSandbox(sb.Key(), false, false)
})
sb.osSbox = c.defOsSbox
}
if sb.osSbox == nil && !sb.config.useExternalKey {
if sb.osSbox, err = osl.NewSandbox(sb.Key(), !sb.config.useDefaultSandBox, false); err != nil {
return nil, fmt.Errorf("failed to create new osl sandbox: %v", err)
}
}
...
err = sb.storeUpdate()
return sb, nil
}
First, construct the sandbox metadata. Then, judge the value of useDefaultSandBox. If the value is true, osSbox is set to defOsSbox.
When will the useDefaultSandBox value become true? When the network type of the container is host, that is, when the container uses the host network, osSbox is set to defOsSbox, which corresponds to the host network namespace with the path /var/run/pouch/netns/default.
When osSbox is null and useExternalKey is false, osSbox is created. If useExternalKey is true, osSbox is not created. In PouchContainer, the network type is null or bridge and useExternalKey is true; therefore, osSbox will not be initialized.
The endpoint join operation is to connect a sandbox to another sandbox. The following are the tasks involved in the join operation in terms of code. The final entrance of the join operation is the sbJoin function.
func (ep *endpoint) sbJoin(sb *sandbox, options ...EndpointOption) error {
...
d, err := n.driver(true)
err = d.Join(nid, epid, sb.Key(), ep, sb.Labels())
if err != nil {
return err
}
...
if err = sb.updateHostsFile(address); err != nil {
return err
}
if err = sb.updateDNS(n.enableIPv6); err != nil {
return err
}
...
if err = sb.populateNetworkResources(ep); err != nil {
return err
}
...
}
First, the join operation methods of the driver are called. The join operation varies according to the driver types. If the driver is bridge, a veth network device pair is created. One end of the veth device pair is connected to bridge, and the other end is not connected. During the update of Hosts file and DNS configuration, populateNetworkResources is called, which will initialize the network resources in the container.
func (sb *sandbox) populateNetworkResources(ep *endpoint) error {
sb.Lock()
if sb.osSbox == nil {
sb.Unlock()
return nil
}
if i != nil && i.srcName != "" {
ifaceOptions = append(ifaceOptions, sb.osSbox.InterfaceOptions().Address(i.addr), sb.osSbox.InterfaceOptions().Routes(i.routes))
if i.mac != nil {
ifaceOptions = append(ifaceOptions, sb.osSbox.InterfaceOptions().MacAddress(i.mac))
}
if err := sb.osSbox.AddInterface(i.srcName, i.dstPrefix, ifaceOptions...); err != nil {
return fmt.Errorf("failed to add interface %s to sandbox: %v", i.srcName, err)
}
}
if joinInfo != nil {
for _, r := range joinInfo.StaticRoutes {
if err := sb.osSbox.AddStaticRoute(r); err != nil {
return fmt.Errorf("failed to add static route %s: %v", r.Destination.String(), err)
}
}
}
...
}
This function will check whether the osSbox is null. If so, it does not initialize the network in container. If not, it puts the container NIC into the container's network namespace and also initializes the IP address, MAC address, and static routes.
Obviously, the network namespace of the container has not been created during the join operation, so the osSbox is null. The container's network is not initialized. Then how is the network initialization completed?
As mentioned in the Relationship between Sandbox and Network Namespace section, if useExternalKey is true, the osSbox of sandbox is null and the container's network will not be initialized. To achieve network initialization, osSbox cannot be null. However, osSbox depends on the container's namespace, and the namespace is created only after container creation. Therefore, the network initialization must be completed after container creation.
The hooks of runC allow users to define the actions in container's lifecycle. There are three types of hooks in runC:
Obviously, the pre-start hooks are suitable for network initialization.
PouchContainer uses the following pre-start hooks for container initialization:
"hooks": {
"prestart": [
{
"path": "/usr/bin/pouchd",
"args": [
"libnetwork-setkey",
"76cf8065568ac429d5aec9908dc149decfadafa6091f991d01bb44f39d51312a",
"edbfbf851eaee68102f15c50ae93739d8bd92d70c66a9be7b37c4d17ce124023"
]
}
]
}
Functions of Libnetwork-setkey
Libnetwork-setkey is provided by the Libnetwork to implement network initialization. This function supports two parameters:
libnetwork-setkey
container-id is the ID of the container. Through container-id, you can obtain the sandbox information. controller-id is the ID of the Libnetwork Controller. Libnetwork-setkey will be processed by the processSetKeyReexec function.
func processSetKeyReexec() {
containerID := os.Args[1]
stateBuf, err := ioutil.ReadAll(os.Stdin)
if err != nil {
return
}
var state configs.HookState
if err = json.Unmarshal(stateBuf, &state); err != nil {
return
}
controllerID := os.Args[2]
err = SetExternalKey(controllerID, containerID, fmt.Sprintf("/proc/%d/ns/net", state.Pid))
}
func SetExternalKey(controllerID string, containerID string, key string) error {
keyData := setKeyData{
ContainerID: containerID,
Key: key}
c, err := net.Dial("unix", udsBase+controllerID+".sock")
if err != nil {
return err
}
defer c.Close()
if err = sendKey(c, keyData); err != nil {
return fmt.Errorf("sendKey failed with : %v", err)
}
return processReturn(c)
}
The processSetKeyReexec logic is simple. You can enter the Pid of a container process. With this Pid, you can obtain the network namespace of the container, that is, /proc/{pid}/ns/net. Then the container ID and namespace are written into a unix socket listened by the Libnetwork Controller.
ExternalKey processing by Libnetwork Controller
As mentioned previously, a Libnetwork Controller is created during Pouchd initialization, and a unix socket is created during Libnetwork Controller initialization to listen to the ExternalKey requests.
func (c *controller) startExternalKeyListener() error {
if err := os.MkdirAll(udsBase, 0600); err != nil {
return err
}
uds := udsBase + c.id + ".sock"
l, err := net.Listen("unix", uds)
...
go c.acceptClientConnections(uds, l)
return nil
}
func (c *controller) processExternalKey(conn net.Conn) error {
buf := make([]byte, 1280)
nr, err := conn.Read(buf)
if err != nil {
return err
}
var s setKeyData
if err = json.Unmarshal(buf[0:nr], &s); err != nil {
return err
}
...
return sandbox.SetKey(s.Key)
}
The requests are accepted by acceptClientConnections and processed by processExternalKey. processExternalKey reads the paths of container-id and namespace, traverses all sandboxes to find out the sandbox corresponding to the container-id, and then calls the SetKey function.
func (sb *sandbox) SetKey(basePath string) error {
osSbox, err := osl.GetSandboxForExternalKey(basePath, sb.Key())
...
for _, ep := range sb.getConnectedEndpoints() {
if err = sb.populateNetworkResources(ep); err != nil {
return err
}
}
return nil
}
SetKey creates the osSbox by using the received network namespace key, traverses all endpoints connected to the sandbox, and calls populateNetworkResources. This function is used to initialize container's network resources. Now, the container's network initialization is complete.
This article describes how the PouchContainer initializes the container's network with Libnetwork. The entire process can be summarized as follows:
506 posts | 48 followers
FollowAlibaba System Software - August 30, 2018
Alibaba System Software - November 29, 2018
Alibaba System Software - August 27, 2018
Alibaba System Software - August 14, 2018
Alibaba System Software - August 6, 2018
Alibaba System Software - August 14, 2018
506 posts | 48 followers
FollowAlibaba Cloud offers an accelerated global networking solution that makes distance learning just the same as in-class teaching.
Learn MoreConnect your business globally with our stable network anytime anywhere.
Learn MoreProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreA secure image hosting platform providing containerized image lifecycle management
Learn MoreMore Posts by Alibaba Cloud Native Community