DataHub Reader reads data from DataHub in real time by using the DataHub SDK.
Background information
DataHub Reader keeps running after it is started and reads data from DataHub when new data is stored to DataHub. DataHub Reader provides the following features:
Reads data in real time.
Reads data in parallel based on the number of shards in DataHub.
Procedure
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
In the Scheduled Workflow pane of the DataStudio page, move the pointer over the
icon and choose .
Alternatively, find the desired workflow in the Scheduled Workflow pane, right-click the workflow name, and then choose
.In the Create Node dialog box, set the Sync Method parameter to End-to-end ETL and configure the Name and Path parameters.
ImportantThe node name cannot exceed 128 characters in length and can contain only letters, digits, underscores (_), and periods (.).
Click Confirm.
On the configuration tab of the real-time synchronization node, drag DataHub in the Input section to the canvas on the right.
Click the DataHub node. In the configuration panel that appears, configure the parameters.
Parameter
Description
Data source
The name of the DataHub that you added to DataWorks. You can select only a DataHub data source.
If no data source is available, click New data source on the right to go to the Data Sources page in Management Center to add a DataHub data source. For more information, see Add a DataHub data source.
Topic
The name of the DataHub topic from which you want to synchronize data. You can click Data preview on the right to preview the selected topic.
Use Subscription Feature
If you turn on Use Subscription Feature, the system automatically generates a subscription ID. Data in DataHub is subscribed based on the subscription ID. This improves stability and performance. We recommend that you do not delete a subscription ID that is in use from DataHub. If you delete a subscription ID that is in use from DataHub, the related task fails.
Output Fields
The fields from which you want to synchronize data.
In the top toolbar of the configuration tab of the real-time synchronization node, click the
icon to save the node.