This topic describes how to import data from Kafka to Simple Log Service. After you import data to Simple Log Service, you can query, analyze, and transform data in Simple Log Service.
Prerequisites
A Kafka cluster is available.
A project and a Logstore are created. For more information, see Create a project and Create a Logstore.
Supported versions
Only Kafka 2.2.0 and later are supported.
Create a data import configuration
Log on to the Simple Log Service console.
In the Quick Data Import section, click Import Data. On the Data Import tab of the dialog box that appears, click Kafka - Data Import.
Select the project and Logstore. Then, click Next.
Configure the parameters for the data import configuration.
In the Import Configuration step, configure the following parameters.
Parameter
Description
Job Name
The ID of the import job.
Display Name
The name of the import job.
Job Description
The description of the import job.
Endpoint
The address that is used to connect to the Kafka cluster. You can obtain the address from the bootstrap.servers field that is configured for the Kafka cluster. Separate multiple addresses with commas (,).
If you use a Kafka cluster that is provided by an Alibaba Cloud ApsaraMQ for Kafka instance, you must enter the IP address or domain name of the instance endpoint.
If you use a Kafka cluster that is deployed on an Alibaba Cloud Elastic Compute Service (ECS) instance, you must enter an IP address of the ECS instance.
If you use other Kafka clusters, you must enter the public IP address or domain name of a broker in the Kafka cluster.
Topics
The Kafka topics. Separate multiple topics with commas (,).
Consumer Group
If you use a Kafka cluster that is provided by an Alibaba Cloud ApsaraMQ for Kafka instance and do not enable the flexible group creation feature, you must select a consumer group. For more information about the feature, see Use the flexible group creation feature. For more information about how to create a consumer group, see Create a consumer group.
Starting Position
The position from which you want the system to start importing data. Valid values:
Earliest: The system starts to import data from the first Kafka data entry that exists.
Latest: The system starts to import data from the most recent Kafka data entry that is generated.
Data Format
The format of the data that you want to import. Valid values:
Simple Mode: If the data that you want to import is in the single-line format, you can select Simple Mode.
JSON String: If the data that you want to import is in the JSON format, you can select JSON String. The import job parses the imported data into key-value pairs and parses only the first layer of the data.
Parse Array Elements
After you turn on Parse Array Elements, the system splits data in the JSON array format into multiple pieces of data based on array elements and then imports the data.
Encoding Format
The encoding format or character set of the data that you want to import. Valid values: UTF-8 and GBK.
VPC-based Instance ID
If your ApsaraMQ for Kafka instance or ECS instance resides in a virtual private cloud (VPC), you can specify the ID of the VPC to allow Simple Log Service to read data from the Kafka cluster over an internal network of Alibaba Cloud.
Data read over an internal network of Alibaba Cloud provides higher security and network stability.
ImportantMake sure that the Kafka cluster can be accessed from the 100.104.0.0/16 CIDR block.
Time Configuration
Time Field
The time field that is used to record the log time. You can enter the name of the column that represents time in the Kafka data.
Regular Expression to Extract Time
If you set Data Format to Simple Mode, you must specify a regular expression to extract time from the Kafka data.
For example, if a Kafka data entry is
message with time 2022-08-08 14:20:20
, you can set Regular Expression to Extract Time to\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d
.Time Field Format
The time format that is used to parse the value of the time field.
You can specify a time format that is supported by Java SimpleDateFormat. Example: yyyy-MM-dd HH:mm:ss. For more information about the time format syntax, see Class SimpleDateFormat. For more information about the common time formats, see Time formats.
You can specify an epoch time format. Valid values: epoch, epochMillis, epochMacro, and epochNano.
Time Zone
The time zone of the time field.
If you set Time Field Format to an epoch time format, you do not need to configure Time Zone.
Default Time Source
If no time extraction information is provided or time extraction fails, the system uses the time source that you specify. Valid values: Current System Time and Kafka Message Timestamp.
Advanced Settings
Log Context
After you turn on Log Context, you can use the contextual query feature. You can view the context of the data that you want to import in a source Kafka partition.
Communication Protocol
The information about the communication protocol that is used to connect to the Kafka cluster. If you want to import data over the Internet, we recommend that you encrypt your connections between Simple Log Service and the Kafka cluster and implement user authentication. The following sample code provides an example.
The protocol field supports the following values: plaintext, ssl, sasl_plaintext, and sasl_ssl. The recommended value is sasl_ssl, which requires connection encryption and user authentication.
If you set protocol to sasl_plaintext or sasl_ssl, you must also configure the sasl node. The mechanism field below the sasl node supports the following values: PLAIN, SCRAM-SHA-256, and SCRAM-SHA-512. This field specifies a username-password authentication mechanism.
{ "protocol":"sasl_plaintext", "sasl":{ "mechanism":"PLAIN", "username":"xxx", "password":"yyy" } }
Private Domain Resolution
If you use a Kafka cluster that is deployed on an ECS instance and the brokers in the cluster are connected to each other over an internal endpoint, you must specify the endpoint and IP address of the ECS instance for each broker. Example:
{ "hostname#1":"192.168.XX.XX", "hostname#2":"192.168.XX.XX", "hostname#3":"192.168.XX.XX" }
Click Preview to preview the import result.
After you confirm the result, click Next.
Preview data, configure indexes, and then click Next.
By default, full-text indexing is enabled for Log Service. You can also configure field indexes based on collected logs in manual mode or automatic mode. To configure field indexes in automatic mode, click Automatic Index Generation. This way, Log Service automatically creates field indexes. For more information, see Create indexes.
ImportantIf you want to query and analyze logs, you must enable full-text indexing or field indexing. If you enable both full-text indexing and field indexing, the system uses only field indexes.
Click Query Log. On the query and analysis page, check whether Kafka data is imported.
Wait for approximately 1 minute. If the required Kafka data exists, the data is imported.
View a data import configuration
After you create a data import configuration, you can view the configuration details and related statistical reports in the Simple Log Service console.
In the Projects section, click the project to which the data import configuration belongs.
Find and click the Logstore to which the data import configuration belongs, choose , and then click the name of the data import configuration.
On the Import Configuration Overview page, view the basic information and statistical reports of the data import configuration.
What to do next
On the Import Configuration Overview page, you can perform the following operations on the data import configuration:
Modify the data import configuration
To modify the data import configuration, click Edit Configurations. For more information, see Create a data import configuration.
Delete the data import configuration
To delete the data import configuration, click Delete Configuration.
WarningAfter the data import configuration is deleted, it cannot be restored.
Stop an import job
To stop a data import job, click Stop.
FAQ
Problem description | Possible cause | Solution |
A broker connection error occurs during preview. Error code: Broker transport failure. |
|
|
A timeout error occurs during preview. Error code: preview request timed out. | The Kafka topics that are specified in the data import configuration do not contain data. | If the Kafka topics do not contain data, write data to the topics and preview the data again. |
Garbled characters exist in the imported data. | The encoding format that is specified in the data import configuration does not meet requirements. | Update the data import configuration based on the actual encoding format of the Kafka data. To handle existing garbled characters, create a Logstore and a data import configuration. |
The log time displayed in Simple Log Service is different from the actual time of the imported data. | No time field is specified in the data import configuration, or the specified time format or time zone is invalid. | Specify a time field or specify a valid time format and time zone. For more information, see Create a data import configuration. |
After data is imported, the data cannot be queried or analyzed. |
|
|
The number of imported data entries is less than expected. | The size of some Kafka messages exceeds 3 MB. You can check the sizes of Kafka messages on the Data Processing Insight dashboard. | Make sure that the size of each Kafka message does not exceed 3 MB in size. |
A large latency exists during the import. |
|
|
Error handling
Item | Description |
A network connection error occurs. | The import job is periodically retried. After the network connection is restored, the import job continues to consume data from the offset of the previous data import interruption. |
A Kafka topic does not exist. | If a Kafka topic that contains data to import does not exist, the import job skips the topic. This does not affect the data import of other normal topics. After the topic is re-created, the import job consumes the data in the topic as expected, and a latency of approximately 10 minutes exists. |