This topic describes the tuning parameters for the enterprise-level state backend, GeminiStateBackend.
Background information
In most scenarios, GeminiStateBackend uses its adaptive tuning feature to adjust parameters automatically, so you do not need to configure them manually. You only need to adjust some basic configurations based on your business scenario. For more information, see Basic configurations. In specific scenarios, you can enable certain configurations for targeted tuning. For example:
To balance memory resources and performance, you can use memory configurations. For more information, see Memory configurations.
If you run out of local disk space, you can use compute-storage separation configurations. For more information, see Compute-storage separation configurations.
If you encounter performance bottlenecks with Join operators, you can use key-value separation configurations. For more information, see Key-value separation configurations.
For an overview of the enterprise-level state backend and its configuration methods, see Introduction to the enterprise-level state backend and Set State-related parameters.
Basic configurations
Parameter | Description | Data type | Default value | Notes |
table.exec.state.ttl | The time-to-live (TTL) for the state of an SQL job. | Long |
| The unit is milliseconds. For example, a value of 129600000 means the TTL is 1.5 days. This parameter cannot be used with state.backend.gemini.ttl.ms. Note Set a short value as needed. |
state.backend.gemini.ttl.ms | The TTL for the state of a DataStream or Python job. | Long | (none) | The unit is milliseconds. For example, a value of 129600000 means the TTL is 1.5 days. This parameter cannot be used with table.exec.state.ttl. Note Set a short value as needed. |
state.backend.gemini.savepoint.external-sort.local-storage.enabled | Specifies whether to store temporary data generated during a standard job snapshot on the local disk. | Boolean | false | Valid values:
Note
|
Memory configurations
The following memory configuration parameters are supported in VVR 4.0 and later.
Parameter | Description | Data type | Default value | Notes |
state.backend.gemini.memory.managed | Specifies whether GeminiStateBackend automatically allocates memory based on the managed memory. | Boolean | true | Valid values:
Note
|
state.backend.gemini.total.writebuffer.size | The total memory size occupied by WriteBuffer. | String | 128 MB | This parameter takes effect only when state.backend.gemini.memory.managed is set to false. Otherwise, the total memory size for WriteBuffer is calculated automatically based on the managed memory. When you configure this parameter, you must add a unit suffix. Valid units are B, KB, MB, and GB. Note
|
state.backend.gemini.offheap.size | The size of the off-heap memory used by GeminiStateBackend. Note This off-heap memory used by Gemini does not include the part used by WriteBuffer. | String | (none) | This parameter takes effect only when state.backend.gemini.memory.managed is set to false. Otherwise, the size of the off-heap memory used by GeminiStateBackend is calculated automatically based on the managed memory. When you configure this parameter, you must add a unit suffix. Valid units are B, KB, MB, and GB. Note
|
The basic configurations for Checkpoint and StateBackend in Apache Flink also apply to GeminiStateBackend. For more information, see Checkpoints and State Backends.
Compute-storage separation configurations
The following compute-storage separation configuration parameters are supported in VVR 4.0.11 and later.
Parameter | Description | Data type | Default value | Notes |
state.backend.gemini.file.cache.type | The compute-storage separation mode. | String |
| Valid values:
|
state.backend.gemini.file.cache.preserved-space | The remaining disk space available for state data on a single TaskManager. | String | 2 GB | When the actual free space is less than this value, GeminiStateBackend stores the state data on a DFS to overcome local storage limitations. When you configure this parameter, you must add a unit suffix. Valid units are B, KB, MB, and GB. Note
|
When you use Object Storage Service (OSS) as the distributed file system, the OSS Client software development kit (SDK) writes the entire file to the local disk before uploading it. This can cause unexpected disk space usage. For example, when Flink creates a savepoint, a single StateBackend generates one large, uncompressed file locally. In this scenario, compute-storage separation is not effective. You can use other methods, such as increasing the concurrency, to reduce the state size on a single node.
Key-value separation configurations
The following key-value separation configuration parameters are supported in VVR 4.0.12 and later.
Parameter | Description | Data type | Default value | Notes |
state.backend.gemini.kv.separate.mode | The key-value separation mode. | String |
| Valid values:
Note
|
state.backend.gemini.kv.separate.value.size.threshold | When key-value separation is enabled, this is the value size threshold that triggers key-value separation. | Integer | 200 | A record that reaches this threshold will have its key and value stored separately. The recommended value range is 150 to 1000. Adjust this parameter based on the join success rate. A higher join success rate allows for a larger parameter value. The unit is bytes. Note In VVR 6.0.1 and later, if you have enabled the adaptive tuning mode, the engine dynamically adjusts this parameter based on data characteristics. You do not need to configure it explicitly. |
Adaptive tuning configurations
The following adaptive tuning configuration parameters are supported in VVR 4.0.12 and later.
Parameter | Description | Data type | Default value | Notes |
state.backend.gemini.auto-tune.mode | The adaptive tuning mode. | String | ACTIVE | Valid values:
Note
|
state.backend.gemini.auto-tune.burst.start.x | The time period for using the performance-first mode in adaptive tuning. | String | (none) | The `x` in the parameter name is any number. `start.x` corresponds to `end.x`. You can use them to set multiple time periods. The parameter value format is yyyy-MM-dd HH:mm:ss. If you need higher Transaction Per Second (TPS) instead of overall efficiency, configure these two parameters. During the specified time period, GeminiStateBackend uses a TPS-first policy. This policy consumes more resources, such as CPU or memory, to achieve a higher TPS. Note
|
state.backend.gemini.auto-tune.burst.end.x |