This topic describes how to create a custom resource group for Data Integration and use this resource group to run batch synchronization nodes.
Prerequisites
Background information
If the shared resource group cannot connect to the data source that you want to access, you can create a custom resource group to run synchronization nodes to speed up data transmission.
- When you register an Elastic Compute Service (ECS) instance for hosting a custom resource group, you can set the network type to Classic Network only if the ECS instance is in the China (Shanghai) region. In this case, you must enter the hostname of the ECS instance. We recommend that you set the network type to VPC. You can set the network type to VPC only for ECS instances in other regions. In this case, you must enter the universally unique identifier (UUID) of the ECS instance to be registered.
- Only an administrator has permissions to access specific files on the ECS instance that hosts a custom resource group. For example, a workspace administrator can call shell or Structured Query Language (SQL) files on the purchased ECS instance when the workspace administrator runs a shell node.
- Resource groups for scheduling are used to schedule nodes. These resource groups have limited resources and are not suitable for computing nodes. Therefore, we recommend that you do not create a custom resource group on the ECS instances that host a resource group for scheduling. MaxCompute can process large amounts of data. We recommend that you use MaxCompute for big data computing.
- The difference between the time of the ECS instance where a custom resource group for Data Integration resides and the current Internet time must be within 2 minutes. Otherwise, service requests may time out, and nodes that are run on the custom resource group for Data Integration may fail to be run.
- You can add only one custom resource group for Data Integration to one ECS instance. You can select only one network type for each custom resource group for Data Integration.
- A custom resource group for Data Integration that you added on the Custom Resource Groups page of Data Integration can only run synchronization nodes in the current workspace. The custom resource
group for Data Integration does not appear in the resource group list on the Resource Groups page.
A custom resource group for Data Integration that you added on the Custom Resource Groups page cannot run synchronization nodes in a manually triggered workflow.
If the timeout error message response code is not 200
appears in the log file of alisatasknode, the custom resource group for Data Integration was not accessible within a specific
period of time. It is usually because the service request API is not stable in that
period of time. The ECS instance that hosts the custom resource group for Data Integration
can continue to work if the error persists for no more than 10 minutes. To find the
error details, view the heartbeat.log file in the /home/admin/alisatasknode/logs
directory.
Purchase an ECS instance
- CentOS V6, CentOS V7, or AliOS is recommended.
- If the added ECS instance needs to run MaxCompute nodes or synchronization nodes, verify that the current Python version of the ECS instance is 2.6 or 2.7. The Python version of CentOS V5 is 2.4, whereas the Python version for other operating systems is later than 2.6.
- Make sure that the ECS instance can access the Internet. Ping
www.alibabacloud.com
on the ECS instance to check whether the URL can be pinged. - We recommend that you purchase an ECS instance with 8 vCPUs and 16 GiB of memory.
View the hostname and internal IP address of the ECS instance
Enable port 8000
To enable port 8000 for reading logs, perform the following steps:
Create a custom resource group for Data Integration
Configure the resource group for Data Integration
Manage custom resource groups for Data Integration
- Manage: allows you to view the IP address, status, and usage of the ECS instance that hosts
the resource group. You can also change or delete the ECS instance that hosts the
resource group or add an ECS instance for the resource group. You can check how to add an ECS instance in the Add Resource Group wizard.
Note
- If the value of Resource Usage for an ECS instance is not
0%
, nodes are running on the ECS instance that hosts the resource group. - After you add an ECS instance for a resource group, you must initialize the ECS instance.
- If the value of Resource Usage for an ECS instance is not
- Initialize Server: After you add an ECS instance for the resource group, you must initialize the ECS
instance.
Click Initialize Server and perform the steps that are described in the following figure to initialize the ECS instance.
- Delete: allows you to delete a custom resource group for Data Integration.
Note DataWorks does not allow you to delete a resource group on which nodes are run. Before you delete a resource group, make sure that no nodes that are in the Running state exist in the resource group.
To view the status of a node, perform the following steps: Choose View auto triggered nodes.
, filter nodes by resource group name, and then view the status of the nodes. For more information, see