This topic describes the commonly asked questions about deploying a VNode in a self-managed Kubernetes cluster to use elastic container instances.
FAQ about networks
FAQ about image pulling
FAQ about pod scheduling
FAQ about storage
FAQ about logging and monitoring
How do cloud services access the IP addresses of on-premises pods?
If you use Express Connect circuits to connect your cloud and on-premises networks, the cloud and on-premises services can learn routing rules from each other by using Border Gateway Protocol (BGP). Then, the on-premises equipment can broadcast the IP addresses of pods to the cloud service by using BGP. As a result, the cloud service can access the IP addresses of the on-premises pods. For more information, see Configure BGP.
How do on-premises services access the IP addresses of cloud pods?
If you use Express Connect circuits to connect your cloud and on-premises networks, the cloud and on-premises services can learn routing rules from each other by using BGP. You can deploy a cloud controller manager (CCM) to automatically synchronize the IP addresses of cloud pods to the virtual private cloud (VPC) route table. For more information about a CCM, see Cloud Controller Manager.
After you deploy a CCM in a self-managed or an on-premises cluster, you can synchronize the route IP addresses of the Kubernetes pods to the VPC route table. When you deploy the CCM, take note of the following items:
Change the format of the providerID value of the Kubernetes cluster nodes to the
<region-id>.<ecs-id>
format. Example:cn-shanghai.i-ankb8zjh2nzchf*******
.Make sure that the pod IP addresses of the cluster nodes are all within the pod CIDR blocks of the nodes. For example, you must configure the Calico IPAM configuration file as the
host-local
type. This configuration specifies that the pod CIDR field of Kubernetes cluster nodes is obtained from the Kubernetes API. This ensures all the pod IP addresses of the cluster nodes are within the pod CIDR blocks of the nodes.You can check the pod CIDR blocks in the spec data of the nodes.
spec: podCIDR: 172.23.XX.0/26 podCIDRs: - 172.23.XX.0/26 providerID: cn-shanghai.i-ankb8zjh2nzchfxxxxxxx
What do I do if an internal network domain name cannot be resolved?
Problem description
Cloud and on-premises services cannot mutually invoke services because the internal network domain names of the services cannot be resolved. The failure to resolve internal network domain names includes:
Cloud services cannot resolve the internal network domain names of on-premises networks.
On-premises services cannot resolve cloud PrivateZone domain names.
Solutions
On-premises solutions and Alibaba Cloud VPC are deployed in different network environments. If cloud and on-premises services can communicate with each other only after the internal network domain names are resolved by using Alibaba Cloud DNS, you can configure Alibaba Cloud DNS PrivateZone to resolve the internal network domain names. For more information, see Use Alibaba Cloud DNS PrivateZone and VPN Gateway to allow ECS instances in a VPC to access an on-premises DNS.
Why on-premises services cannot access cloud services?
Problem description
On-premises services cannot use leased lines to access Alibaba Cloud services such as ApsaraDB RDS, Object Storage Service (OSS), and Log Service.
Solutions
You can use one of the following solution. We recommend that you use Solution 1.
Solution 1
Configure the domain name of the cloud service on the cloud. Then, the virtual border router (VBR) publishes the route to the on-premises network over BGP. For more information, see Access cloud services.
Solution 2
Add a static route to the on-premises network to route 100.64.0.0/10 to the leased line.
Why am I unable to pull images from a self-managed container image repository?
Problem description
When I try to pull images from a self-managed container image repository, the following error is reported:
Solutions
The reason of the preceding problem is that the image repository uses a certificate that is issued by you. The certificate that is issued by you is unqualified. Therefore, the certificate-based authentication fails when you pull images. When you create a pod, you can add the following annotation to skip the certificate-based authentication:
"k8s.aliyun.com/insecure-registry": "<host-name>"
For example, if the link of an NGINX image in the private image repository is test.example.com/test/nginx:apline
, you can add the "k8s.aliyun.com/insecure-registry": "test.example.com"
annotation to skip certificate-based authentication.
How do I schedule pods to a VNode?
You can use one of the following methods to schedule pods to a VNode based on your business requirements. Then, the pods can be run on the elastic container instances that are deployed in the VNode. The scheduling methods include:
Manual scheduling
You can configure the nodeSelector and tolerations parameters or specify the nodeName parameter to schedule pods to the VNode. For more information, see Schedule pods to a VNode.
Automatic Scheduling
After you deploy the eci-profile component, you can specify the Selector parameter. This way, the system automatically schedules pods that meet the conditions specified by Selector to the VNode. For more information, see Use eci-profile to schedule pods to a VNode.
Why do DaemonSet pods remain in the Pending state after they are scheduled to a VNode?
VNodes are not real nodes and do not support DaemonSets. When you create a DaemonSet, you must configure an anti-affinity scheduling policy to prevent Kubernetes from scheduling DaemonSet pods to VNodes. Sample configurations:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: type
operator: NotIn
values:
- virtual-kubelet
Why does the scheduling fail when I attempt to schedule pods to a VNode by configuring pod labels?
This problem occurs because the version of your Kubernetes cluster is earlier than v1.16.
What do I do if the mount of a NAS volume times out?
Cause
After you mount a File Storage NAS (NAS) file system, Kubernetes recursively runs the chmod and chown commands on the files in the NAS directory based on the permissions and ownership specified in the pods. If the NAS directory contains a large number of files, and you configure the permissions and ownership of the files in the security context when you create the pod, the mount of a NAS volume times out.
Solutions
When you configure the security context, set fsGroupChangePolicy to OnRootMismatch. This way, the system does not run the chmod and chown commands when the permissions and ownership of the root directory in the NAS file system match the permissions and ownership specified in the pods. For more information, see Configure a Security Context for a Pod or Container.
How do I collect logs to Log Service?
You can install the Logtail agent by installing alibaba-log-controller on your self-managed Kubernetes cluster to collect logs to Log Service. When you install the Logtail agent, the system automatically performs the following operations:
Creates a ConfigMap named alibaba-log-configuration. The ConfigMap contains the configuration information of Log Service, such as projects.
Optional. Creates a Custom Resource Definition (CRD) named AliyunLogConfig.
Optional. Creates a Deployment controller named alibaba-log-controller. The Deployment controller is used to monitor the changes in the AliyunLogConfig CRD and the creation of Logtail configurations.
Creates a DaemonSet named logtail-ds to collect logs from nodes.
For more information, see Install Logtail in a Kubernetes cluster.
If your cluster is of an early version, such as v1.13, download a CRD of an early version from alibaba-cloud-log-0.1.1 and deploy the CRD. If you have other questions, submit a ticket.
What do I do if the metrics-server reports a 404 error?
Metrics-server v0.5.x or earlier can call the kubelet API to collect metrics from VNodes. If a 404 error occurs, try to use a metrics-server of a version earlier than the current version.
Metrics-server v0.6.0 and later use /metrics/resource
instead of /stats/summary
to fetch node metrics. The metrics-servers cannot use the metric resource endpoint /metrics/resource to collect metrics from VNodes.
The following code provides the boot parameters of metrics-servers:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --kubelet-insecure-tls