DataWorks provides LogHub Reader and LogHub Writer for you to read data from and write data to Simple Log Service data sources. This topic describes the capabilities of synchronizing data from or to Simple Log Service data sources.
Limits
When you use DataWorks Data Integration to run batch synchronization tasks to write data to Simple Log Service, Simple Log Service does not ensure idempotence. If you rerun a failed task, redundant data may be generated.
Data types
The following table provides the support status of main data types in Simple Log Service.
Data type | LogHub Reader for batch data read | LogHub Writer for batch data write | LogHub Reader for real-time data read |
STRING | Supported | Supported | Supported |
LogHub Writer for batch data write
LogHub Writer converts the data types supported by Data Integration to STRING before data is written to Simple Log Service. The following table lists the data type mappings based on which LogHub Writer converts data types.
Data Integration data type
Simple Log Service data type
LONG
STRING
DOUBLE
STRING
STRING
STRING
DATE
STRING
BOOLEAN
STRING
BYTES
STRING
LogHub Reader for real-time data read
The following table describes the metadata fields that LogHub Reader for real-time data synchronization provides.
Field provided by LogHub Reader for real-time data synchronization
Data type
Description
__time__
STRING
A reserved field of Simple Log Service. The field specifies the time when logs are written to Simple Log Service. The field value is a UNIX timestamp in seconds.
__source__
STRING
A reserved field of Simple Log Service. The field specifies the source device from which logs are collected.
__topic__
STRING
A reserved field of Simple Log Service. The field specifies the name of the topic for logs.
__tag__:__receive_time__
STRING
The time when logs arrive at the server. If you enable the public IP address recording feature, this field is added to each raw log when the server receives the logs. The field value is a UNIX timestamp in seconds.
__tag__:__client_ip__
STRING
The public IP address of the source device. If you enable the public IP address recording feature, this field is added to each raw log when the server receives the logs.
__tag__:__path__
STRING
The path of the log file collected by Logtail. Logtail automatically adds this field to logs.
__tag__:__hostname__
STRING
The hostname of the device from which Logtail collects data. Logtail automatically adds this field to logs.
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Add and manage data sources. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.
When you configure a data synchronization task that synchronizes data from a Simple Log Service data source, the data source allows you to filter data by using the query syntax of Simple Log Service and SLS Processing Language (SPL) statements. Simple Log Service uses SPL to process logs. For more information, see Appendix 2: SPL syntax.
Configure a batch synchronization task to synchronize data of a single table
For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix 1: Code and parameters.
Configure a real-time synchronization task to synchronize data of a single table
For more information about the configuration procedure, see Create a real-time synchronization task to synchronize incremental data from a single table and Configure a real-time synchronization task in DataStudio.
Configure synchronization settings to implement batch synchronization of all data in a database, real-time synchronization of full data or incremental data in a database, and real-time synchronization of data from sharded tables in a sharded database
For more information about the configuration procedure, see Configure a synchronization task in Data Integration.
FAQ
For more information, see FAQ about Data Integration.
Appendix 1: Code and parameters
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Code for LogHub Reader
Parameters in code for LogHub Reader
Code for LogHub Writer
Parameters in code for LogHub Writer
Appendix 2: SPL syntax
When you configure a data synchronization task that synchronizes data from a Simple Log Service data source, the data source allows you to filter data by using the query syntax of Simple Log Service and SPL statements. Simple Log Service uses SPL to process logs. The following table describes the SPL syntax in different scenarios.
For more information about SPL, see SPL overview.
Scenario | SQL statement | SPL statement |
Data filtering |
|
|
Field processing and filtering | Search for a field in exact mode and rename the field.
|
|
Data standardization (SQL function calls) | Convert a data type and parse time.
| Convert a data type and parse time.
|
Field extraction | Extract data by using a regular expression.
Extract JSON data.
|
|