All Products
Search
Document Center

MaxCompute:Use DataHub to migrate log data to MaxCompute

Last Updated:Jun 17, 2024

This topic describes how to use DataHub to migrate log data to MaxCompute.

Prerequisites

The following permissions are granted to the account authorized to access MaxCompute:

  • CreateInstance permission on MaxCompute projects

  • Permissions to view, modify, and update MaxCompute tables

For more information, see MaxCompute permissions.

Background information

DataHub is a platform that is designed to process streaming data. After data is uploaded to DataHub, the data is stored in a table for real-time processing. DataHub executes scheduled tasks within five minutes to synchronize the data to a MaxCompute table for offline computing.

To periodically archive streaming data in DataHub to MaxCompute, you need only to create and configure a DataHub connector.

Data type mappings

MaxCompute

DataHub

BIGINT

BIGINT

STRING

STRING

BOOLEAN

BOOLEAN

DOUBLE

DOUBLE

DATETIME

TIMESTAMP

DECIMAL

DECIMAL

TINYINT

TINIINT

SMALLINT

SMALLINT

INT

INTEGER

FLOAT

FLOAT

BLOB

STRING

MAP

Not supported

ARRAY

Not supported

Procedure

  1. On the odpscmd client, create a table that is used to store the data synchronized from DataHub. For example, you can execute the following statement to create a table:

     CREATE TABLE test(f1 string, f2 string, f3 double) partitioned by (ds string);
  2. Create a project in the DataHub console.

    1. Log on to the DataHub console. In the upper-left corner, select a region.

    2. In the left-side navigation pane, click Projects.

    3. In the upper-right corner of the Projects page, click Create Project.

    4. In the Create Project panel, configure Name and Description, and click Create.

  3. Create a topic.

    1. On the Projects page, find the desired project and click View in the Actions column.

    2. On the project details page, click Create Topic in the upper-right corner.

    3. In the Create Topic panel, select Import MaxCompute Tables for Creation Type and configure other parameters.

      新建Topic

    4. Click Next Step to complete topic configurations.

      Note
      • Schema corresponds to a MaxCompute table. The field names, data types, and field sequence specified by Schema must be consistent with those of the MaxCompute table. You can create a DataConnector only if the three conditions are met.

      • You are allowed to migrate the topics of the TUPLE and BLOB types to MaxCompute tables.

      • A maximum of 20 topics can be created by default. If you require more topics, submit a ticket.

      • The owner of a DataHub topic or the Creator account has the permissions to manage a DataConnector. For example, you can create or delete a DataConnector.

  4. Write data to the newly created topic.

    1. On the Topic List tab of the project details page, find the newly created topic and click View in the Actions column.

    2. On the topic details page, click Connector in the upper-right corner.

    3. In the Create Connector panel, click MaxCompute, configure the parameters, and then click Create.

  5. View DataConnector details.

    1. In the left-side navigation pane, click Projects.

    2. On the Projects page, find the desired project and click View in the Actions column.

    3. On the Topic List tab, find the topic and click View in the Actions column.

    4. On the topic details page, click the Connector tab.

    5. Find the newly created DataConnector and click View to view DataConnector details.

      By default, DataHub migrates data to MaxCompute tables at five-minute intervals or when the amount of data reaches 60 MB. Sync Offset indicates the number of migrated data entries.DataConnector详情

  6. Execute the following statement to check whether the log data is migrated to MaxCompute:

    SELECT * FROM test;

    If the result shown in the following figure is displayed, the log data is migrated to MaxCompute.测试结果