All Products
Search
Document Center

Lindorm:Data import methods

Last Updated:Mar 20, 2024

DataWorks is an end-to-end big data development and governance platform released by Alibaba Cloud. It provides a series of features such as data integration, data development, and data O&M. You can configure data import tasks in DataWorks to import full data from MySQL, PolarDB, PostgreSQL, Oracle, SQL Server, and Cassandra databases to LindormTable. This topic describes how to configure tasks in DataWorks to import data to Lindorm.

Prerequisites

The IP address of your client is added to the whitelist of the Lindorm instance. For more information, see Configure whitelists.

Usage notes

  • To connect to a Lindorm instance over the Internet or the Lindorm instance that you want to access is a single-node Lindorm instance, you must upgrade your SDK and change the configurations before you perform the operations described in this topic. For more information, see Step 1 in Use the ApsaraDB for HBase API for Java to connect to and use LindormTable.

  • If your application is deployed on an Elastic Compute Service (ECS) instance, make sure that your Lindorm instance and the ECS instance meet the following requirements in advance to ensure network connectivity:

    • Your Lindorm instance and ECS instance are deployed in the same region. We recommend that you also deploy the two instances in the same zone to reduce network latency.

    • Your Lindorm instance and ECS instance are deployed in the same VPC.

Step 1: Create a workspace

Before you configure the data import task, you must create a workspace in DataWorks for data development and task management. For more information, see Create a workspace.

Step 2: Create a resource group

You can create resource groups to allocate resources within your account and manage user permissions

The following table describes the resource groups that you can create.

Resource group

References

Usage notes

Exclusive resource group

Exclusive resource group mode

Exclusive resources cannot be shared across regions. For example, the exclusive resources in the China (Shanghai) region can be used only by the workspace in the China (Shanghai) region. The resources cannot be allocated to virtual private clouds (VPCs) in other regions. Exclusive resources can access only Lindorm instances that are attached to the same vSwitch as the exclusive resources.

Default resource group

None

When the ECS instances access a Lindorm instance over the Internet, you are charged additional fees for DataWorks.

Step 3: Configure networks

Before you configure data import tasks, you must configure networks based on the resource group type to ensure that DataWorks is connected to the Lindorm instance.

Configure networks for an exclusive resource group

  1. On the Details page of the Lindorm instance, obtain the VPC in which the Lindorm instance is deployed.image.png

  2. Bind the exclusive resource group to the VPC in which the Lindorm instance is deployed. For more information, see Exclusive resource group mode.

  3. In the VPC console, obtain the IPv4 CIDR block of the VPC and vSwitch to which the exclusive resource group is bound.image.png

  4. Add the IPv4 CIDR block to the whitelist of the Lindorm instance. For more information, see Configure whitelists.

Configure networks for the default resource group

For more information about how to obtain the CIDR block of the default resource group, see Configure an IP address whitelist. Add the CIDR block to the whitelist of the Lindorm instance. For more information, see Configure whitelists.

Step 4: Create a synchronization task

For more information about how to create a batch synchronization task, see Configure a batch synchronization task by using the code editor.

Step 5: Modify task configurations

  • If you use the Lindorm SQL mode to access Lindorm, refer to the configuration of the TableService model described in Lindorm Reader and Lindorm Writer.

  • If you use the HBase mode to access Lindorm, refer to the configuration of the WideColumn model described in Lindorm Reader and Lindorm Writer.

Important

The lindorm.client.seedserver parameter specifies the LindormTable endpoint for HBase. For more information about how to obtain the endpoint, see View the endpoints of LindormTable.

Step 6: Commit and deploy the synchronization task

If you want to periodically run the batch synchronization task, you must deploy the task to the production environment. For more information, see Deploy nodes.