Parameter | Description | Type | Required | Default value | Remarks |
connector | The type of the table. | String | No | No default value | If you create an Apache Paimon table in an Apache Paimon catalog, you do not need to specify this parameter. If you create a temporary Apache Paimon table in a catalog of a storage other than Apache Paimon, set the value to paimon .
|
path | The storage path of the table. | String | No | No default value | If you create an Apache Paimon table in an Apache Paimon catalog, you do not need to specify this parameter. If you create a temporary Apache Paimon table in a catalog of a storage other than Apache Paimon, set this parameter to the HDFS or OSS directory in which you want to store the table.
|
auto-create | Specifies whether to automatically create an Apache Paimon table file if no Apache Paimon table file exists in the specified path when you create a temporary Apache Paimon table.
| Boolean | No | false | Valid values: false (default): If no Apache Paimon table file exists in the specified path, an error is returned. true: If the specified path does not exist, the system automatically creates an Apache Paimon table file.
|
bucket | The number of buckets in each partition. | Integer | No | 1 | Data that is written to the Apache Paimon table is distributed to each bucket based on the columns that are specified by the bucket-key parameter. Note We recommend that the data in each bucket be less than 5 GB in size. |
bucket-key | The bucket key columns. | String | No | No default value | The columns based on which the data written to the Apache Paimon table is distributed to different buckets. Separate column names with commas (,). For example, 'bucket-key' = 'order_id,cust_id' indicates that data is distributed to buckets based on the order_id and cust_id columns. Note If you do not specify this parameter, data is distributed based on the primary key. If no primary key is specified for the Apache Paimon table, data is distributed based on the values of all columns.
|
changelog-producer | The incremental data generation mechanism. | String | No | none | Apache Paimon can generate complete incremental data for any input data stream to facilitate downstream data consumption. Each UPDATE_AFTER data record corresponds to an UPDATE_BEFORE data record. Valid values: none (default): No incremental data is generated. The downstream consumer can read data from the Apache Paimon table in streaming mode. However, the incremental data that is read by the downstream consumer contains only UPDATE_AFTER data and does not contain UPDATE_BEFORE data. input: The input data streams are written to an incremental data file as incremental data in dual-write mode. full-compaction: Complete incremental data is generated each time full compaction is performed. lookup: Complete incremental data is generated before commit savepoint is performed.
For more information about how to select an incremental data generation mechanism, see the Incremental data generation mechanism section of this topic. |
full-compaction.delta-commits | The maximum interval at which full compaction is performed. | Integer | No | No default value | A full compaction is definitely triggered when the number of commit savepoints reaches the value of this parameter. |
lookup.cache-max-memory-size | The memory cache size of the Apache Paimon dimension table. | String | No | 256 MB | The value of this parameter determines the cache sizes of both the dimension table and the lookup changelog producer. |
merge-engine | The mechanism for merging data that has the same primary key. | String | No | deduplicate | Valid values: deduplicate: Only the latest data record is retained. partial-update: Existing data that has the same primary key as the latest data is overwritten by the latest data in the non-null columns. Data in other columns remains unchanged. aggregation: An aggregate function is specified to perform pre-aggregation.
For more information about the data merging mechanism, see the Data merging mechanism section of this topic. |
partial-update.ignore-delete | Specifies whether to ignore delete messages. | Boolean | No | false | Valid values: true: Delete messages are ignored. false: Delete messages are not ignored. You must set how the sink handles delete messages by configuring sequence.field or other parameters. Otherwise, errors like IllegalStateException or IllegalArgumentException occur.
Note In Realtime Compute for Apache Flink that uses VVR 8.0.6 or earlier, this parameter takes effect only when merge-engine = partial-update is configured. In Realtime Compute for Apache Flink that uses VVR 8.0.7 or later, this parameter is applicable to non-partial update scenarios, where it is functionally equivalent to the ignore-delete parameter. In this case, we recommend that you use ignore-delete instead. Whether delete messages need to be ignored depends on the actual scenario. You need to configure this parameter based on your business requirements.
|
ignore-delete | Specifies whether to ignore delete messages. | Boolean | No | false | Its valid values are the same as partial-update.ignore-delete. Note Only Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 8.0.7 or later supports this parameter. This parameter is functionally equivalent to the partial-update.ignore-delete parameter. Use the ignore-delete parameter instead of the partial-update.ignore-delete parameter; and avoid configuring both parameters simultaneously.
|
partition.default-name | The default name of the partition. | String | No | __DEFAULT_PARTITION__ | If the value of a partition key column is null or an empty string, the value of this parameter is used as the partition name. |
partition.expiration-check-interval | The interval at which the system checks partition expiration. | String | No | 1h | For more information, see How do I configure automatic partition expiration? |
partition.expiration-time | The validity period of a partition. | String | No | No default value | If the period of time for which a partition exists exceeds the value of this parameter, the partition expires. By default, a partition never expires. The period for which a partition exists is calculated based on the value of the partition. For more information, see How do I configure automatic partition expiration? |
partition.timestamp-formatter | The pattern that is used to convert a time string into a timestamp. | String | No | No default value | This parameter specifies the pattern that is used to extract the period of time for which a partition exists from the partition value. For more information, see How do I configure automatic partition expiration? |
partition.timestamp-pattern | The pattern that is used to convert a partition value into a time string. | String | No | No default value | This parameter specifies the pattern that is used to extract the period of time for which a partition exists from the partition value. For more information, see the "How do I configure automatic partition expiration?" section of the FAQ about upstream and downstream storage topic. |
scan.bounded.watermark | The end condition for bounded streaming mode. If the watermark of data in an Apache Paimon source table exceeds the value of this parameter, the generation of data in the Apache Paimon source table ends. | Long | No | No default value | N/A. |
scan.mode | The consumer offset of the Apache Paimon source table. | String | No | default | For more information, see How do I specify the consumer offset for an Apache Paimon source table? |
scan.snapshot-id | The ID of the savepoint from which the Apache Paimon source table starts to consume data. | Integer | No | No default value | For more information, see the How do I specify the consumer offset for an Apache Paimon source table? |
scan.timestamp-millis | The point in time from which the Apache Paimon source table starts to consume data. | Integer | No | No default value | For more information, see the How do I specify the consumer offset for an Apache Paimon source table? |
snapshot.num-retained.max | The maximum number of the latest savepoints that can be retained. | Integer | No | 2147483647 | The savepoint expiration is triggered only if the condition specified by the snapshot.num-retained.max parameter or the snapshot.time-retained parameter is met and the condition specified by the snapshot.num-retained.min parameter is met. |
snapshot.num-retained.min | The minimum number of the latest savepoints that can be retained. | Integer | No | 10 | N/A. |
snapshot.time-retained | The duration for which savepoints can be retained. | String | No | 1h | The savepoint expiration is triggered only if the condition specified by the snapshot.num-retained.max parameter or the snapshot.time-retained parameter is met and the condition specified by the snapshot.num-retained.min parameter is met. |
write-mode | The write mode of the Apache Paimon table. | String | No | change-log | Valid values: change-log: Data is inserted into, deleted from, and updated in the Apache Paimon table based on the primary key. append-only: The Apache Paimon table allows only data insertion and does not support operations based on the primary key. This mode is more efficient than the change-log mode.
For more information about write modes, see Write modes. |
scan.infer-parallelism | Specifies whether to automatically infer the degree of parallelism of the Apache Paimon source table. | Boolean | No | false | Valid values: true: The parallelism of the Apache Paimon source table is automatically inferred based on the number of buckets. false: The default degree of parallelism that is configured based on Ververica Platform (VVP) is used. If the resource configuration is in expert mode, the degree of parallelism that is configured is used.
|
scan.parallelism | The degree of parallelism of the Apache Paimon source table. | Integer | No | No default value | Note This parameter does not take effect in Expert mode. Check the mode parameter value by navigating to . |
sink.parallelism | The parallelism of the Apache Paimon sink table. | Integer | No | No default value | Note This parameter does not take effect in Expert mode. Check the mode parameter value by navigating to . |
sink.clustering.by-columns | The clustering column that is used to write data to the Apache Paimon sink table. | String | No | No default value | For an Apache Paimon append-only table without primary keys, you can specify this parameter in a batch deployment to enable the clustering feature for data writing. After this feature is enabled, data is clustered and displayed in specific columns by size. This improves the query speed of the table Separate multiple column names with commas (,). Example: 'col1,col2' . For more information about the clustering feature, see Clustering. |
sink.delete-strategy | Specifies the verification strategy for the system to handle retraction (delete and update-before) messages. | Enum | No | NONE | Valid values: NONE (default): Verification is not performed. IGNORE_DELETE: The sink operator should ignore retraction messages. NON_PK_FIELD_TO_NULL: The sink operator should ignore update_before messages; When receiving delete messages, it should retain the corresponding primary key values and delete the values in non-primary-key columns. This value is suitable for partial-update scenarios where data is written to a single table from multiple sinks. DELETE_ROW_ON_PK: The sink operator should ignore update_before messages and delete data records corresponding to delete messages. CHANGELOG_STANDARD: The sink operator should delete the data records corresponding to the received update_before and delete messages.
Note Only Realtime Compute for Apache Flink that uses VVR 8.0.8 or later supports this parameter. You can configure parameters such as ignore-delete and merge-engine to manage how Paimon Sink processes retraction message. This parameter is used to check whether the retraction behavior aligns with expectations. If the actual retraction behavior deviates from the expectation, Paimon Sink will prevent problematic operations and reports an error. The error will guide you adjust parameters such as ignore-delete and merge-engine to correct the retraction behavior.
|