Data transformation (new version) - Simple Log Service - Alibaba Cloud Documentation Center

Simple Log Service provides a managed, scalable, and highly available new version of data transformation service. The service helps you sort out data, extract information, and cleanse, filter, and distribute data to Logstores.

How it works

The data transformation (new version) feature processes log data in real time based on hosted data consumption jobs and the Processing Language (SPL) rule consumption feature. For more information about SPL rules, see SPL syntax. For more information about the scenarios that involve SPL, see Overview of real-time consumption.

Important

Data transformation relies on the real-time consumption API of Simple Log Service and does not depend on the index configuration of a source Logstore.

Scheduling mechanism

The transformation service has a scheduler that starts one or more instances to concurrently process a transformation job. Each running instance works as a consumer to consume one or more shards of the source Logstore. The scheduler dynamically scales instances based on the instance resource usage and consumption progress. The maximum concurrency of a single job is the number of shards in the source Logstore.

Running instances

Based on the SPL rules of a job and the configurations of the destination Logstore, SPL rules can be used to consume source log data from the shards allocated by the data transformation service. The processed results are distributed based on the SPL rules and written to the corresponding destination Logstore. During the running of an instance, consumer offsets of shards are automatically saved to ensure that consumption continues from the point it is interrupted when the job is stopped and then restarted.

Stop a job

The lifecycle and state of a data transformation job are related to the job configurations and the performed operation. For more information, see ETL.

Scenarios

The data transformation feature is used in scenarios such as data standardization, forwarding, desensitization, and filtering. The following items describe the scenarios:

Data standardization and information extraction: extracts fields from variously formatted logs and converts the data formats to obtain structured data for efficient stream processing and analysis by downstream applications and data warehouses.
Data forwarding and distribution:
- Collects logs of different types to a single Logstore and then distributes the logs to downstream Logstores based on the source service module or business component. By way of this, data isolation and scenario-specific computing are achieved.
- Collects logs from the regions where a business is deployed and then aggregates the logs to a central region by using an acceleration service. This way, global central log management is made possible.
Data cleansing and filtering: clears invalid log records or unused log fields, filters out key fields, and writes the key fields to downstream Logstores for focused analysis.
Data desensitization: desensitizes sensitive information such as passwords, mobile phone numbers, and IP addresses, in data records.

Benefits

The SPL syntax of Simple Log Service is easy to use to centrally collect, query, and consume data.
Line-by-line debugging and code prompts are supported during SPL development, which provides an experience similar to Integrated Development Environment (IDE) coding.
Data is processed in real time to allow for visibility within seconds. In addition, elastic scaling of computing resources is supported to provide a high throughput.
Data processing directives and SQL functions are provided out-of-the-box for log analysis.
A dashboard with real-time metrics is offered to help you monitor jobs in a well informed manner.
A fully managed maintenance-free service that can be integrated with big data services of Alibaba Cloud and open source ecosystems.

Billing

If a Logstore is billed based on the amount of data written, the data transformation (new version) feature does not incur fees. However, if data is pulled or written by using the interface where the public domain of Simple Log Service resides, Internet reading traffic is charged based on the compressed data volume. For more information, see Billable items of pay-by-ingested-data.
If a Logstore is billed based on the features used, you are charged fees for computing and network resources used by the data transformation (new version) feature. For more information, see Billable items of pay-by-feature.