Simple Log Service data source

Limits

When you use DataWorks Data Integration to run batch synchronization tasks to write data to Simple Log Service, Simple Log Service does not ensure idempotence. If you rerun a failed task, redundant data may be generated.

Data types

The following table provides the support status of main data types in Simple Log Service.

Data type	LogHub Reader for batch data read	LogHub Writer for batch data write	LogHub Reader for real-time data read

Data type	LogHub Reader for batch data read	LogHub Writer for batch data write	LogHub Reader for real-time data read
STRING	Supported	Supported	Supported

LogHub Writer for batch data write
LogHub Writer converts the data types supported by Data Integration to STRING before data is written to Simple Log Service. The following table lists the data type mappings based on which LogHub Writer converts data types.
Data Integration data type
Simple Log Service data type
Data Integration data type
Simple Log Service data type
LONG
STRING
DOUBLE
STRING
STRING
STRING
DATE
STRING
BOOLEAN
STRING
BYTES
STRING

LogHub Reader for real-time data read

The following table describes the metadata fields that LogHub Reader for real-time data synchronization provides.

Field provided by LogHub Reader for real-time data synchronization	Data type	Description

Field provided by LogHub Reader for real-time data synchronization	Data type	Description
__time__	STRING	A reserved field of Simple Log Service. The field specifies the time when logs are written to Simple Log Service. The field value is a UNIX timestamp in seconds.
__source__	STRING	A reserved field of Simple Log Service. The field specifies the source device from which logs are collected.
__topic__	STRING	A reserved field of Simple Log Service. The field specifies the name of the topic for logs.
__tag__:__receive_time__	STRING	The time when logs arrive at the server. If you enable the public IP address recording feature, this field is added to each raw log when the server receives the logs. The field value is a UNIX timestamp in seconds.
__tag__:__client_ip__	STRING	The public IP address of the source device. If you enable the public IP address recording feature, this field is added to each raw log when the server receives the logs.
__tag__:__path__	STRING	The path of the log file collected by Logtail. Logtail automatically adds this field to logs.
__tag__:__hostname__	STRING	The hostname of the device from which Logtail collects data. Logtail automatically adds this field to logs.

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Add and manage data sources. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Note

When you configure a data synchronization task that synchronizes data from a Simple Log Service data source, the data source allows you to filter data by using the query syntax of Simple Log Service and SLS Processing Language (SPL) statements. Simple Log Service uses SPL to process logs. For more information, see Appendix 2: SPL syntax.

Configure a batch synchronization task to synchronize data of a single table

For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix 1: Code and parameters.

Configure a real-time synchronization task to synchronize data of a single table

For more information about the configuration procedure, see Create a real-time synchronization task to synchronize incremental data from a single table and Configure a real-time synchronization task in DataStudio.

Configure synchronization settings to implement batch synchronization of all data in a database, real-time synchronization of full data or incremental data in a database, and real-time synchronization of data from sharded tables in a sharded database

For more information about the configuration procedure, see Configure a synchronization task in Data Integration.

FAQ

For more information, see FAQ about Data Integration.

Appendix 1: Code and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Code for LogHub Reader

{
 "type":"job",
 "version":"2.0",// The version number. 
 "steps":[
     {
         "stepType":"LogHub",// The plug-in name. 
         "parameter":{
             "datasource":"",// The name of the data source. 
             "column":[// The names of the columns. 
                 "col0",
                 "col1",
                 "col2",
                 "col3",
                 "col4",
                 "C_Category",
                 "C_Source",
                 "C_Topic",
                 "C_MachineUUID", // The log topic. 
                 "C_HostName", // The hostname. 
                 "C_Path", // The path. 
                 "C_LogTime" // The time when the event occurred. 
             ],
             "beginDateTime":"",// The start time of data consumption. 
             "batchSize":"",// The number of data entries that are queried at a time. 
             "endDateTime":"",// The end time of data consumption. 
             "fieldDelimiter":",",// The column delimiter. 
             "logstore":""// The name of the Logstore. 
         },
         "name":"Reader",
         "category":"reader"
     },
     { 
         "stepType":"stream",
         "parameter":{},
         "name":"Writer",
         "category":"writer"
     }
 ],
 "setting":{
     "errorLimit":{
         "record":"0"// The maximum number of dirty data records allowed. 
     },
     "speed":{
         "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1 // The maximum number of parallel threads. 
            "mbps":"12",// The maximum transmission rate. Unit: MB/s. 
     }
 },
 "order":{
     "hops":[
         {
             "from":"Reader",
             "to":"Writer"
         }
     ]
 }
}

Parameters in code for LogHub Reader

Parameter	Description	Required	Default Value
endPoint	The endpoint of Simple Log Service. The endpoint is a URL that you can use to access the project and the log data in the project. The endpoint varies based on the project name and the Alibaba Cloud region where the project resides. For more information about the endpoints of Simple Log Service in each region, see Endpoints.	Yes	No default value
accessId	The AccessKey ID of the Alibaba Cloud account that is used to access the Simple Log Service project.	Yes	No default value
accessKey	The AccessKey secret of the Alibaba Cloud account that is used to access the Simple Log Service project.	Yes	No default value
project	The name of the Simple Log Service project. A project is the basic unit for managing resources in Simple Log Service. Projects are used to isolate resources and control access to the resources.	Yes	No default value
logstore	The name of the Logstore. A Logstore is a basic unit that you can use to collect, store, and query log data in Simple Log Service.	Yes	No default value
batchSize	The number of data entries to read from Simple Log Service at a time.	No	128
column	The names of the columns. You can set this parameter to the metadata in Simple Log Service. Supported metadata includes the log topic, unique identifier of the host, hostname, path, and log time. Note Column names are case-sensitive. For more information about column names in Simple Log Service, see Introduction.	Yes	No default value
beginDateTime	The start time of data consumption. The value is the time at which log data arrives at Simple Log Service. This parameter defines the left boundary of a left-closed, right-open interval in the format of yyyyMMddHHmmss, such as 20180111013000. This parameter can work with the scheduling parameters in DataWorks. For example, if you enter `beginDateTime=${yyyymmdd-1}` in the Parameters field on the Properties tab, you can set Start Timestamp to ${beginDateTime}000000 on the task configuration tab to consume logs that are generated from 00:00:00 of the data timestamp. For more information, see Supported formats of scheduling parameters. Note The beginDateTime and endDateTime parameters must be used in pairs.	Yes	No default value
endDateTime	The end time of data consumption. This parameter defines the right boundary of a left-closed, right-open interval in the format of yyyyMMddHHmmss, such as 20180111013010. This parameter can work with the scheduling parameters in DataWorks. For example, if you enter endDateTime=${yyyymmdd} in the Parameters field on the Properties tab, you can set End Timestamp to ${endDateTime}000000 on the task configuration tab to consume logs that are generated until 00:00:00 of the next day of the data timestamp. For more information, see Supported formats of scheduling parameters. Note The time that is specified by the endDateTime parameter of the previous interval cannot be earlier than the time that is specified by the beginDateTime parameter of the current interval. Otherwise, data in some regions may not be read.	Yes	No default value

Code for LogHub Writer

{
    "type": "job",
    "version": "2.0",// The version number. 
    "steps": [
        { 
            "stepType": "stream",
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType":"LogHub",// The plug-in name. 
            "parameter": {
                "datasource": "",// The name of the data source. 
                "column": [// The names of the columns. 
                    "col0",
                    "col1",
                    "col2",
                    "col3",
                    "col4",
                    "col5"
                ],
                "topic": "",// The name of the topic. 
                "batchSize": "1024",// The number of data records to write at a time. 
                "logstore": ""// The name of the Logstore. 
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": ""// The maximum number of dirty data records allowed. 
        },
        "speed": {
            "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":3, // The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Parameters in code for LogHub Writer

Note

LogHub Writer obtains data from a reader and converts the data types supported by Data Integration into STRING. If the number of data records reaches the value specified for the batchSize parameter, LogHub Writer sends the data records to Simple Log Service at a time by using Simple Log Service SDK for Java.

Parameter	Description	Required	Default Value
endpoint	The endpoint of Simple Log Service. The endpoint is a URL that you can use to access the project and the log data in the project. The endpoint varies based on the project name and the Alibaba Cloud region where the project resides. For more information about the endpoints of Simple Log Service in each region, see Endpoints.	Yes	No default value
accessKeyId	The AccessKey ID of the Alibaba Cloud account that is used to access the Simple Log Service project.	Yes	No default value
accessKeySecret	The AccessKey secret of the Alibaba Cloud account that is used to access the Simple Log Service project.	Yes	No default value
project	The name of the Simple Log Service project.	Yes	No default value
logstore	The name of the Logstore. A Logstore is a basic unit that you can use to collect, store, and query log data in Simple Log Service.	Yes	No default value
topic	The name of the topic.	No	An empty string
batchSize	The number of data records to write to Simple Log Service at a time. Default value: 1024. Maximum value: 4096. Note The size of the data to write to Simple Log Service at a time cannot exceed 5 MB. You can change the value of this parameter based on the size of a single data record.	No	1,024
column	The names of columns in each data record.	Yes	No default value

Appendix 2: SPL syntax

When you configure a data synchronization task that synchronizes data from a Simple Log Service data source, the data source allows you to filter data by using the query syntax of Simple Log Service and SPL statements. Simple Log Service uses SPL to process logs. The following table describes the SPL syntax in different scenarios.

Note

For more information about SPL, see SPL overview.

Scenario	SQL statement	SPL statement

Scenario	SQL statement	SPL statement
Data filtering	`SELECT * WHERE Type='write'`	`\| where Type='write'`
Field processing and filtering	Search for a field in exact mode and rename the field. `SELECT "__tag__:node" AS node, path`	Search for a field in exact mode and rename the field. `\| project node="__tag__:node", path` Search for fields by mode. `\| project -wildcard "__tag__:"` Rename a field without affecting other fields. `\| project-rename node="__tag__:node"` Remove fields by mode. `\| project-away -wildcard "__tag__:"`
Data standardization (SQL function calls)	Convert a data type and parse time. `SELECT CAST(Status AS BIGINT) AS Status, date_parse(Time, '%Y-%m-%d %H:%i') AS Time`	Convert a data type and parse time. `\| extend Status=cast(Status as BIGINT), extend Time=date_parse(Time, '%Y-%m-%d %H:%i')`
Field extraction	Extract data by using a regular expression. `SELECT CAST(Status AS BIGINT) AS Status, date_parse(Time, '%Y-%m-%d %H:%i') AS Time` Extract JSON data. `SELECT CAST(Status AS BIGINT) AS Status, date_parse(Time, '%Y-%m-%d %H:%i') AS Time`	Extract data by using a regular expression based on one-time matching. `\| parse-regexp protocol, '(\w+)/(\d+)' as scheme, version` Extract JSON data based on full expansion. `\| parse-json -path='$.0' content` Extract data from a CSV file. `\| parse-csv -delim='^_^' content as ip, time, host`

Limits

Data types

Add a data source

Develop a data synchronization task

Configure a batch synchronization task to synchronize data of a single table

Configure a real-time synchronization task to synchronize data of a single table

Configure synchronization settings to implement batch synchronization of all data in a database, real-time synchronization of full data or incremental data in a database, and real-time synchronization of data from sharded tables in a sharded database

FAQ

Appendix 1: Code and parameters

Configure a batch synchronization task by using the code editor

Code for LogHub Reader

Parameters in code for LogHub Reader

Code for LogHub Writer

Parameters in code for LogHub Writer

Appendix 2: SPL syntax

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

China Gateway Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic Desktop Service (EDS) Featured

Cloud Phone Beta

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)