This topic describes the limits on data import from Kafka to Simple Log Service.
Limits on collection
Item | Description |
Compression format | Kafka Producer supports data that is compressed in the following formats: gzip, zstd, lz4, and snappy. Kafka Producer discards data that is compressed in other formats. You can view the number of data entries that are discarded in the Deliver Failed chart on the Data Processing Insight dashboard. For more information, see View the data import configuration. |
Maximum number of topics | A maximum of 10,000 topics can be specified in a data import configuration. |
Size of a single log | The size of a single log is limited to 3 MB. If the size of a log exceeds the limit, the log is discarded. You can view the number of logs that are discarded in the Deliver Failed chart on the Data Processing Insight dashboard. For more information, see View the data import configuration. |
Starting position | When you configure the Starting Position parameter for a data import configuration, you can select only Earliest or Latest. You cannot specify a point in time as the starting position for data import. |
Limits on configuration
Item | Description |
Number of data import configurations | The total number of data import configurations that can be created in a single project can be up to 100 regardless of configuration types. If you want to increase the limit, submit a ticket. |
Bandwidth | When a data import task reads data from an Alibaba Cloud Kafka cluster over a virtual private cloud (VPC), the maximum network bandwidth allowed for the task is 128 MB/s by default. If you require a higher bandwidth, submit a ticket. |
Limits on performance
Item | Description |
Number of concurrent subtasks | Simple Log Service creates multiple import subtasks to concurrently import data based on the number of Kafka topics in the backend. Each subtask can process decompressed data at a maximum rate of 50 MB/s.
If you want to increase the limits, submit a ticket. |
Number of partitions for a topic | If a Kafka topic has a large number of partitions, additional subtasks can be created to improve the throughput of data import. If a Kafka topic has a large amount of data, you can increase the number of partitions for the topic. We recommend that the number of partitions for a topic be no less than 16. |
Number of shards in a Logstore | The write performance of Simple Log Service varies based on the number of shards in a Logstore. A single shard supports a write speed of 5 MB/s. If an import task writes a large amount of data to Simple Log Service, we recommend that you increase the number of shards in the Logstore. For more information, see Manage shards. |
Data compression | If you want to import a large amount of data from Kafka to Simple Log Service, we recommend that you compress the data when you write the data to Kafka. This way, the amount of data that is read over a network is significantly reduced. Network transmission is more time-consuming than data decompression, especially when data is imported over the Internet. |
Network | If your Alibaba Cloud Kafka cluster is deployed in a VPC, you can read data from the cluster over the VPC. This reduces Internet traffic and accelerates data transmission. In this scenario, the data read bandwidth can reach more than 100 MB/s. When you import data over the Internet, the performance and bandwidth of the network cannot be guaranteed. This may cause import latency. |
Other limits
Item | Description |
Metadata synchronization latency | An import task synchronizes the metadata of your Kafka cluster with Simple Log Service at 10-minute intervals. If a topic or partition is newly created, the metadata of the topic or partition is imported after a latency of approximately 10 minutes. Note If you set the Starting Position parameter to Latest in a data import configuration, the data that is initially written to a new topic may be skipped (within a maximum of 10 minutes). |
Validity period of an offset for a topic | An offset for a Kafka topic is valid for up to seven days. If no data is read from a topic within seven days, the offset is discarded. If new data is written to the topic after seven days, Simple Log Service determines which offset to use based on the starting position specified in a data import configuration. |