All Products
Search
Document Center

Realtime Compute for Apache Flink:FAQ about network connectivity

Last Updated:Feb 20, 2025

By default, Realtime Compute for Apache Flink cannot access the Internet. This topic provides answers to some frequently asked questions about service access over the Internet, access across virtual private clouds (VPCs), domain name resolution, and network connectivity test.

How do I troubleshoot network issues?

A Realtime Compute for Apache Flink workspace is deployed in a VPC. You cannot change the VPC that you select when you purchase a Realtime Compute for Apache Flink workspace. If the source or the sink is not in the same VPC as the Realtime Compute for Apache Flink workspace, the source or the sink is disconnected from the Realtime Compute for Apache Flink workspace and data cannot be read from the source or written to the sink. If data cannot be read from the source or written to the sink, perform the following steps to check whether a network issue exists:

  1. Check the network connectivity between the upstream and downstream storage services and the Realtime Compute for Apache Flink workspace. You can test the network connectivity in the development console of Realtime Compute for Apache Flink console. For more information, see the How do I use the network detection feature? section in this topic.

    By default, Realtime Compute for Apache Flink can access only services that are deployed in the same VPC and the same region as Realtime Compute for Apache Flink. If you want to access resources across VPCs or access Realtime Compute for Apache Flink over the Internet, use the following methods:

  2. Check whether the CIDR block of the vSwitch to which the Realtime Compute for Apache Flink workspace belongs is added to the whitelists of the upstream and downstream storage services. For more information, see How do I configure a whitelist?

  3. If a network timeout error persists, the network issue may be caused by a connection timeout. In this case, increase the value of the connect.timeout parameter in the WITH clause. The default value of this parameter is 30, in seconds.

How do I use the network detection feature?

Realtime Compute for Apache Flink supports the network detection feature. To use the network detection feature, perform the following steps in the development console of Realtime Compute for Apache Flink:

  1. Log on to the management console of Realtime Compute for Apache Flink.

  2. Find the workspace that you want to manage and click Console in the Actions column.

  3. In the top navigation bar of the Overview page, click the Network detection icon.

    Image 161.png

  4. In the Network detection dialog box, configure the Host parameter to specify an IP address or endpoint to check whether the running environment of a Realtime Compute for Apache Flink deployment is connected to the upstream and downstream storage services.

    Important

    If you specify an endpoint, remove :<port> from the end of the endpoint and enter <port> in the Port field in the Network detection dialog box.

    image.png

    If the error message "connect timed out" appears, check whether the endpoint that you access is the endpoint of the Internet or another VPC. By default, Realtime Compute for Apache Flink can access only services that are deployed in the same VPC as Realtime Compute for Apache Flink. If you want to access resources across VPCs or access Realtime Compute for Apache Flink over the Internet, see the How does Realtime Compute for Apache Flink access a service across VPCs? and How does Realtime Compute for Apache Flink access the Internet? sections in this topic.

How does Realtime Compute for Apache Flink access the Internet?

  • Background information

    By default, Realtime Compute for Apache Flink cannot access the Internet. Therefore, Alibaba Cloud provides NAT gateways to enable communications between virtual private clouds (VPCs) and the Internet. This way, users of Realtime Compute for Apache Flink can access the Internet by using user-defined extensions (UDXs) or DataStream code.背景说明

  • Solution

    Create a NAT gateway in the VPC. Then, create an SNAT entry to bind the vSwitch that is associated with Realtime Compute for Apache Flink to an elastic IP address (EIP). This way, the service can access the Internet by using the EIP. To access the Internet by using the EIP, perform the following steps:

    1. Create a NAT gateway. For more information, see the "Create a NAT gateway" section of the Purchase an Internet NAT gateway topic.

    2. Create an SNAT entry and bind the vSwitch that is associated with Realtime Compute for Apache Flink to an EIP. For more information, see the "Create an SNAT entry" section of the Create and manage SNAT entries topic.

How do I view the public bandwidth?

If the metric values of the deployment are normal and no backpressure exists in the deployment during data reading or writing over the Internet, you can view the public bandwidth to check whether a bottleneck issue occurs. To view the public bandwidth, perform the following steps:

  1. Log on to the management console of Realtime Compute for Apache Flink. Find the desired workspace and choose More > Workspace Details in the Actions column. In the Workspace Details message, view the VPC ID.

  2. Log on to the VPC console. In the left-side navigation pane, click VPC. On the VPC page, find the desired VPC and click its ID.

  3. On the Resource Management tab of the details page of the VPC, click the value of Internet NAT Gateway in the Access to Internet section.

    Note

    If the value of Internet NAT Gateway in the Access to Internet section is 0, you must create an Internet NAT gateway. For more information, see Create and manage an Internet NAT gateway.

  4. On the Internet NAT Gateway page, click the ID of the Internet NAT gateway.

  5. On the Associated EIP tab, click the ID of the EIP.

  6. On the page that appears, click the Monitoring and O&M tab to view the public bandwidth.

How does Realtime Compute for Apache Flink access a service across VPCs?

You can use one of the following methods to allow Realtime Compute for Apache Flink to access a service across VPCs:

  • Submit a ticket to establish connections between VPCs by using services such as Express Connect. When you create the ticket, select Virtual Private Cloud (VPC) as the product name. You are charged when you use this method.

  • Connect network instances to a Cloud Enterprise Network (CEN) instance to enable network communication among the network instances. For more information, see Scenario-based networking for multi-VPC interconnection.

  • Use VPN Gateway to establish a VPN connection between VPCs. For more information, see Establish IPsec-VPN connections between two VPCs.

  • Release the resource that resides in a different VPC from Realtime Compute for Apache Flink. Then, purchase the same resource that resides in the same VPC as Realtime Compute for Apache Flink.

  • Release the Realtime Compute for Apache Flink workspace. Then, purchase another Realtime Compute for Apache Flink workspace that is in the same VPC as the service that you want Realtime Compute for Apache Flink to access.

  • Enable Internet access for Realtime Compute for Apache Flink. This way, Realtime Compute for Apache Flink can access other services over the Internet. This way, Realtime Compute for Apache Flink can access other services over the Internet. By default, Realtime Compute for Apache Flink cannot access the Internet. For more information about how to allow Realtime Compute for Apache Flink to access the Internet, see the How does Realtime Compute for Apache Flink access the Internet? section in this topic.

    Note

    The Internet has a longer latency than internal networks. If you have high performance requirements, we recommend that you do not use this method.

How do I configure a whitelist?

In most cases, the upstream and downstream storage services that are supported by Realtime Compute for Apache Flink do not allow access from external systems. Therefore, you must perform the following steps to add the CIDR block of the vSwitch of Realtime Compute for Apache Flink to the whitelist of the storage service that Realtime Compute for Apache Flink needs to access.

  1. Log on to the management console of Realtime Compute for Apache Flink.

  2. Find the workspace that you want to manage and choose More > Workspace Details in the Actions column.

  3. In the Workspace Details dialog box, view the CIDR block of the vSwitch to which the Realtime Compute for Apache Flink workspace belongs.网段

  4. Add the CIDR block of the vSwitch of Realtime Compute for Apache Flink to the whitelist of the storage service that Realtime Compute for Apache Flink needs to access.

    For example, you must configure a whitelist for an ApsaraDB RDS for MySQL database. For more information, see Configure an IP address whitelist.

    Note
    • If you add a vSwitch later, you must also add the CIDR block of the new vSwitch to the whitelist of the storage service that Realtime Compute for Apache Flink needs to access.

    • If your vSwitch is not in the same zone as the upstream and downstream storage services, the network can be connected after you add the CIDR block of the vSwitch to the whitelist.

How do I resolve the domain name of the service on which a Realtime Compute for Apache Flink deployment depends?

If your Realtime Compute for Apache Flink deployment depends on the domain name of the service, a domain name resolution failure is reported when you migrate the service data to Realtime Compute for Apache Flink. To solve this issue, you can use one of the following methods based on the scenario:

  • You have a self-managed DNS. Flink can connect to the self-managed DNS service over a VPC, and the self-managed DNS can normally resolve domain names.

    In this case, you can perform DNS resolution by using the deployment template of Realtime Compute for Apache Flink. For example, the IP address of your self-managed DNS is 192.168.0.1. Perform the following steps:

    1. Log on to the management console of Realtime Compute for Apache Flink.

    2. Find the workspace that you want to manage and click Console in the Actions column.

    3. In the left-side navigation pane, click Configurations. On the Deployment Defaults tab, add the following code to the Other Configuration field:

      env.java.opts: >-
        -Dsun.net.spi.nameservice.provider.1=default
        -Dsun.net.spi.nameservice.provider.2=dns,sun
        -Dsun.net.spi.nameservice.nameservers=192.168.0.1
      Note

      If your self-managed DNS has multiple IP addresses, we recommend that you separate the IP addresses with commas (,).

    4. Click Save Changes.

    5. Create a draft and run the deployment for the draft in the development console of Realtime Compute for Apache Flink.

      • If the UnknownHostException error persists, domain names cannot be resolved. In this case, contact Alibaba Cloud for technical support.

      • After self-managed DNS is configured, the deployment frequently fails, and the error message "JobManager heartbeat timeout" appears. For more information about the troubleshooting method, see What do I do if the error message "JobManager heartbeat timeout" appears?

  • You do not deploy self-managed DNS or Realtime Compute for Apache Flink cannot connect to self-managed DNS over a VPC.

    In this case, you must use Alibaba Cloud DNS PrivateZone to resolve domain names. For example, the VPC in which Realtime Compute for Apache Flink resides is named vpc-flinkxxxxxxx, and the domain names that your Realtime Compute for Apache Flink deployment needs to access are aaa.test.com 127.0.0.1, bbb.test.com 127.0.0.2, and ccc.test.com 127.0.0.3. To resolve the domain names, perform the following steps:

    1. Activate Alibaba Cloud DNS PrivateZone. For more information, see Activate Alibaba Cloud DNS PrivateZone.

    2. Add a zone and use the common suffix of the service that your Realtime Compute for Apache Flink deployment needs to access as the zone name. For more information, see Add a zone.

    3. Associate the zone with the VPC in which Realtime Compute for Apache Flink resides. For more information, see Associate a zone with a VPC or disassociate a zone from a VPC.

    4. Add DNS records to the zone. For more information, see Add DNS records.结果

    5. In the development console of Realtime Compute for Apache Flink, create and run a deployment or stop and rerun an existing deployment.

      If the UnknownHost error persists, domain names cannot be resolved. In this case, contact Alibaba Cloud for technical support.

What do I do if the error message "JobManager heartbeat timeout" appears?

  • Problem description

    After self-managed DNS is configured, the deployment frequently fails, and the error message "JobManager heartbeat timeout" appears.

  • Cause

    The network latency to self-managed DNS is high.

  • Solution

    Change the value of jobmanager.retrieve-taskmanager-hostname to false in the deployment code to disable DNS for the TaskManagers of the deployment. After the configuration is changed, the deployment can still be connected to external services by using the domain name. For more information about how to configure this parameter, see How do I configure runtime parameters for a deployment by using code?

Why does the "timeout expired while fetching topic metadata" error message appear even if a network connection is established between Realtime Compute for Apache Flink and Kafka?

Realtime Compute for Apache Flink may be unable to read data from Kafka even if a network connection is established between the two systems. To ensure that the services are connected and data can be read from Kafka, you must use the endpoint that is described in the cluster metadata returned by Kafka brokers during bootstrapping. For more information, visit Kafka network connection issues. To check the network connection, perform the following steps:

  1. Use zkCli.sh or zookeeper-shell.sh to log on to the ZooKeeper service that is used by the Kafka cluster.

  2. Run the ls /brokers/ids command to obtain the IDs of all Kafka brokers.

  3. Run the get /brokers/ids/{your_broker_id} command to view the metadata information of Kafka brokers.

    The endpoint is displayed in listener_security_protocol_map.

  4. Check whether Realtime Compute for Apache Flink can connect to the endpoint.

    If the endpoint contains a domain name, configure the DNS service for Realtime Compute for Apache Flink. For more information about how to configure the DNS service for Realtime Compute for Apache Flink, see the "How do I resolve the domain name of the service on which a Flink deployment depends?" section of the Reference topic.