LogHub Reader reads data from LogHub topics in real time by using the LogHub SDK and supports shard merging and splitting. After shards are merged or split, duplicate data records may exist but no data is lost.
Background information
The following table describes the metadata fields that LogHub Reader for real-time synchronization provides.
Field provided by LogHub Reader for real-time synchronization | Data type | Description |
__time__ | STRING | A reserved field of Simple Log Service. The field specifies the time when logs are written to Simple Log Service. The field value is a UNIX timestamp in seconds. |
__source__ | STRING | A reserved field of Simple Log Service. The field specifies the source device from which logs are collected. |
__topic__ | STRING | A reserved field of Simple Log Service. The field specifies the name of the topic for logs. |
__tag__:__receive_time__ | STRING | The time when logs arrive at the server. If you enable the public IP address recording feature, this field is added to each raw log when the server receives the logs. The field value is a UNIX timestamp in seconds. |
__tag__:__client_ip__ | STRING | The public IP address of the source device. If you enable the public IP address recording feature, this field is added to each raw log when the server receives the logs. |
__tag__:__path__ | STRING | The path of the log file collected by Logtail. Logtail automatically adds this field to logs. |
__tag__:__hostname__ | STRING | The hostname of the device from which Logtail collects data. Logtail automatically adds this field to logs. |
Procedure
Go to the DataStudio page.
Log on to the DataWorks console.
In the left-side navigation pane, click Workspaces.
In the top navigation bar, select the region in which the workspace that you want to manage resides. On the Workspaces page, find the workspace and click in the Actions column.
In the Scheduled Workflow pane, move the pointer over the icon and choose .
Alternatively, right-click the required workflow, and then choose
.In the Create Node dialog box, set the Sync Method parameter to End-to-end ETL and configure the Name and Path parameters.
ImportantThe node name cannot exceed 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
Click Confirm.
On the configuration tab of the real-time synchronization node, drag section to the canvas on the right.
Click the LogHub node. In the panel that appears, configure the parameters.
Parameter
Description
Data source
The LogHub data source that you have configured. You can select only a LogHub data source.
If no data source is available, click New data source on the right to add one on the Data Source page. For more information, see Add a LogHub (SLS) data source.
Logstore
The name of the Logstore from which you want to read data. You can click Preview Data to preview data in the selected topic.
Advanced configuration
Specifies whether to split data in the Logstore. If you select Split for Split tasks, you must specify Split rules.
You can specify a sharding rule in the format of shardId % X = Y. The equation is used to obtain the remainder of shardId divided by X. shardId indicates the ID of a sharding task, X indicates the total number of shards, and Y indicates the ID of a shard on which the sharding task takes effect. The value is [0, X-1]. For example, shardId % 5 = 3 indicates that the source data that you want to synchronize is divided into five shards, and a sharding task is assigned to take effect on the shard whose ID is 3.
Output field
The fields from which you want to synchronize data. For information about the field descriptions, see Background information.
Click the icon in the top toolbar.