This topic describes the tuning parameters that you can configure when you use the enterprise-level state backend storage GeminiStateBackend.
Background information
In most scenarios, GeminiStateBackend can automatically adjust parameter configurations without the need for manual configurations based on the adaptive parameter tuning feature. You need to only adjust specific basic configurations based on your business scenarios. For more information, see Basic parameters. In specific scenarios, you can configure parameters to further optimize the performance. This topic describes specific parameter configurations in the following scenarios:
If you want to coordinate memory resources and performance, configure memory-related parameters. For more information, see Memory-related parameters.
If your local disk space is insufficient, configure the parameters for compute-storage separation. For more information, see Parameters for compute-storage separation.
If a JOIN operator has a performance bottleneck, configure the parameters for key-value separation. For more information, see Parameters for key-value separation.
For more information about enterprise-level state storage, see GeminiStateBackend. For more information about how to configure an enterprise-level state storage, see Configure the parameters that are related to state backends.
Basic parameters
Parameter | Description | Data type | Default value | Remarks |
table.exec.state.ttl | The TTL of state data in SQL deployments. | LONG |
| Unit: milliseconds. For example, if you set this parameter to 129600000, the TTL of the state data is 1.5 days. This parameter cannot be used together with the state.backend.gemini.ttl.ms parameter. Note We recommend that you set this parameter to a small value based on your business requirements. |
state.backend.gemini.ttl.ms | The TTL of state data in DataStream deployments or Python deployments. | LONG | (none) | Unit: milliseconds. For example, if you set this parameter to 129600000, the TTL of the state data is 1.5 days. This parameter cannot be used together with the table.exec.state.ttl parameter. Note We recommend that you set this parameter to a small value based on your business requirements. |
state.backend.gemini.savepoint.external-sort.local-storage.enabled | Specifies whether the temporary data generated during the savepoint creation is stored on a local disk. | BOOLEAN | false | Valid values:
Note
|
Memory-related parameters
The following table describes the memory-related parameters that can be configured only in VVR 4.0 and later.
Parameter | Description | Data type | Default value | Remarks |
state.backend.gemini.memory.managed | Specifies whether GeminiStateBackend automatically allocates memory based on the managed memory. | BOOLEAN | true | Valid values:
Note
|
state.backend.gemini.total.writebuffer.size | The total size of memory that is occupied by WriteBuffer. | STRING | 128 MB | This parameter takes effect when the state.backend.gemini.memory.managed parameter is set to false. Otherwise, the total size of memory that is occupied by WriteBuffer is automatically calculated based on the managed memory. When you configure this parameter, you must add a unit to the value of this parameter. The unit can be B, KB, MB, or GB. Note
|
state.backend.gemini.offheap.size | The size of the off-heap memory that is used by GeminiStateBackend. Note The off-heap memory that is used by GeminiStateBackend does not include the memory that is occupied by WriteBuffer. | STRING | (none) | This parameter takes effect when the state.backend.gemini.memory.managed parameter is set to false. Otherwise, the size of the off-heap memory that is used by GeminiStateBackend is automatically calculated based on the managed memory. When you configure this parameter, you must add a unit to the value of this parameter. The unit can be B, KB, MB, or GB. Note
|
The basic configurations of checkpoints and state backends in Apache Flink also apply to GeminiStateBackend. For more information, see Checkpoints and State Backends.
Parameters for compute-storage separation
The following table describes the parameters for compute-storage separation that can be configured only in VVR 4.0.11 and later.
Parameter | Description | Data type | Default value | Remarks |
state.backend.gemini.file.cache.type | The compute-storage separation mode. | STRING |
| Valid values:
|
state.backend.gemini.file.cache.preserved-space | The disk space that is available for the state data on a TaskManager. | STRING | 2 GB | If the actual available disk space is less than the value of this parameter, GeminiStateBackend stores the state data in a DFS to eliminate the limit on the local storage. When you configure this parameter, you must add a unit to the value of this parameter. The unit can be B, KB, MB, or GB. Note
|
The Object Storage Service (OSS) Client SDK writes data from a file to local disks before the OSS Client SDK uploads the file. Therefore, if OSS is used as a DFS, unexpected usage of disk space may occur. When Flink creates a savepoint, a single state backend generates only one file. As a result, a large uncompressed file is generated and the file occupies the disk space. In this scenario, the compute-storage separation feature fails. To resolve this issue, you must increase the parallelism to reduce the size of the state data on a single node.
Parameters for key-value separation
The following table describes the parameters for key-value separation that can be configured only in VVR 4.0.12 and later.
Parameter | Description | Data type | Default value | Remarks |
state.backend.gemini.kv.separate.mode | The key-value separation mode. | STRING |
| Valid values:
Note
|
state.backend.gemini.kv.separate.value.size.threshold | The value size threshold that triggers key-value separation after key-value separation is enabled. | INTEGER | 200 | The key and value of the record whose value reaches this threshold are separately stored. The recommended value ranges from 150 to 1000. You can adjust the value of this parameter based on the success rate of JOIN operations. If the success rate of JOIN operations is high, you can set this parameter to a large value. Unit: bytes. Note In Realtime Compute for Apache Flink that uses VVR 6.0.1 or later, if you have enabled the adaptive parameter tuning feature, the engine can dynamically adjust the value of this parameter based on the data characteristics. You do not need to explicitly configure this parameter. |
Parameters for adaptive parameter tuning
The following table describes the parameters for adaptive parameter tuning that can be configured only in VVR 4.0.12 and later.
Parameter | Description | Data type | Default value | Remarks |
state.backend.gemini.auto-tune.mode | The adaptive parameter tuning mode. | STRING | ACTIVE | Valid values:
Note
|
state.backend.gemini.auto-tune.burst.start.x | The period of time during which the performance-first mode is used when adaptive parameter tuning is enabled. | STRING | (none) | The letter x in the names of the parameters can be replaced by a number. start.x corresponds to end.x. You can configure the two parameters to specify multiple time periods. The values of the parameters are in the yyyy-MM-dd HH:mm:ss format. If your requirements for transactions per second (TPS) are higher than your requirements for performance, you can configure the two parameters. GeminiStateBackend uses the TPS-first policy during the period of time that is specified by these parameters to achieve higher TPS. However, more resources are consumed if you configure the two parameters. The resources indicate CPU cores and memory. Note
|
state.backend.gemini.auto-tune.burst.end.x |