All Products
Search
Document Center

Realtime Compute for Apache Flink:System tables

Last Updated:Jun 21, 2024

Apache Paimon (Paimon) uses system tables to store the metadata and data consumption information of each Paimon table. This topic describes the columns in different system tables.

System tables related to table metadata

Snapshots table

A snapshots table allows you to query information about each snapshot file of a Paimon table, such as the ID and creation time of the snapshot file.

For example, you can use the following SQL statement to query the snapshots table that is created for the Paimon table named mycat.mydb.mytbl:

SELECT * FROM mycat.mydb.`mytbl$snapshots`;

The following table describes the columns in a snapshots table.

Column

Data type

Description

snapshot_id

Long

The ID of the snapshot file.

schema_id

Long

The ID of the schema file used by the snapshot file. You can obtain the schema information from the schemas table.

commit_time

Timestamp

The time when the snapshot file was created.

total_record_count

Long

The total number of records in the data files to which the snapshot file points.

Note

Values in this column do not indicate the number of data records logically stored in the Paimon table. This is because data files are compacted in memory before they are captured in a snapshot file.

delta_record_count

Long

The number of added data records compared with the previous snapshot file.

changelog_record_count

Long

The number of generated changelog records compared with the previous snapshot file.

Schemas table

A schemas table allows you to query the current and historical schema information of a Paimon table. If you modify the schema of a Paimon table by using the ALTER TABLE, CREATE TABLE AS (CTAS), or CREATE DATABASE AS(CDAS) statement, each modification generates a record in the schemas table. For example, you can use the following SQL statement to query the schemas table created for the Paimon table named mycat.mydb.mytbl:

SELECT * FROM mycat.mydb.`mytbl$schemas`;

The following table describes the columns in a schemas table.

Column

Data type

Description

schema_id

Long

The ID of the schema.

fields

String

The name and data type of each column.

partition_keys

String

The name of the partition key.

primary_keys

String

The name of the primary key.

options

String

The values of the table options.

comment

String

The comment added to the table to provide additional information.

update_time

Timestamp

The time when the schema was last modified.

Options table

An options table allows you to query the current configuration of table options.

For example, you can use the following SQL statement to query the options table created for the Paimon table named mycat.mydb.mytbl:

SELECT * FROM mycat.mydb.`mytbl$options`;

The following table describes the columns in an options table.

Column

Data type

Description

key

String

The name of the table option.

value

String

The value of the table option.

Note

If a table option is not included in the table, the option is set to the default value.

Partitions table

A partitions table allows you to query the partitions in a Paimon table, the total number of data records in each partition, and the total size of files in each partition.

For example, you can use the following SQL statement to query the partitions table created for the Paimon table named mycat.mydb.mytbl:

SELECT * FROM mycat.mydb.`mytbl$partitions`;

The following table describes the columns in a partitions table.

Column

Data type

Description

partition

String

The partition in the [partition value 1, partition value 2, ...] format.

record_count

Long

The total number of data records in the partition.

Note

Values in this column do not indicate the number of data records logically stored in the partition. This is because data files are compacted in memory before they are captured in a snapshot file.

file_size_in_bytes

Long

The total size of files in the partition. Unit: bytes.

Only data files to which the current snapshot file points are counted.

Files table

A files table allows you to query all data files to which a snapshot file points, including the file format, the number of records in the file, and the file size.

For example, you can use the following SQL statement to query the files table created for the most recent snapshot of the Paimon table named mycat.mydb.mytbl:

SELECT * FROM mycat.mydb.`mytbl$files`;

You can also query the files table created for a specific snapshot. In the following example, the snapshot ID is set to 5 for the Paimon table named mycat.mydb.mytbl.

SELECT * FROM mycat.mydb.`mytbl$files` /*+ OPTIONS('scan.snapshot-id'='5') */;

The following table describes the columns in a files table.

Column

Data type

Description

partition

String

The partition that contains the file. Format: [partition value 1, partition value 2, ...].

bucket

Integer

The bucket that contains the file. This column applies only to primary key tables that use the fixed bucket mode.

file_path

String

The path of the file.

file_format

String

The format of the file.

schema_id

Long

The ID of the schema file used by the snapshot file. You can obtain the schema information from the schemas table.

level

Integer

The log-structured merge-tree (LSM) level of the file. This column applies only to primary key tables.

A value of 0 indicates a small file. You can query the number of small files in a bucket to monitor the compaction progress in the bucket.

record_count

Long

The number of records in the file.

file_size_in_bytes

Long

The size of the file. Unit: bytes.

Note

The files table does not contain information about historical data files that are not pointed to by the queried snapshot file.

Tags table

A tags table allows you to query information about tags of a Paimon table, such as the name of a tag and the snapshot associated with the tag.

For example, you can use the following SQL statement to query the tags table created for the Paimon table named mycat.mydb.mytbl:

SELECT * FROM mycat.mydb.`mytbl$tags`;

The following table describes the columns in a tags table.

Column

Data type

Description

tag_name

String

The name of the tag.

snapshot_id

Long

The ID of the snapshot associated with the tag.

schema_id

Long

The ID of the schema used by the tag. You can obtain the schema information from the schemas table.

commit_time

Timestamp

The time when the snapshot associated with the tag was created.

record_count

Long

The number of records in the data files.

System tables related to data consumption

Read-optimized table

The data files of a Paimon table must be compacted in the memory before they are available for consumption, which affects the read efficiency. If you use a primary key table and want to improve the efficiency of batch reading or ad hoc OLAP queries, you can consume data from the corresponding read-optimized table. This way, only files that do not require compaction are read, which eliminates the compaction process and improves query efficiency. Take note that the read-optimized table generates data with increased latency because small files are compacted less frequently. You can configure the following parameter to compact small files at a regular interval and strike a balance between write efficiency, read efficiency, and data latency.

Parameter

Description

Data Type

Default value

Notes

compaction.optimization-interval

The interval at which small files are compacted.

Duration

None

Frequent compaction of small files may affect the write efficiency. We recommend that you set this parameter to a value greater than 30 min.

For example, you can use the following SQL statement to query the read-optimized table created for the most recent snapshot of the Paimon table named mycat.mydb.mytbl:

SELECT * FROM mycat.mydb.`mytbl$ro`;

Audit log table

If you need to know the specific operation of each record in a Paimon table, you can consume data from the corresponding audit log table. Compared with the original Paimon table, the audit log table inserts a column named rowkind at the beginning of each record to store the operation type. Valid values in the rowkind column are +I (insert), -U (update before), +U (update after), and -D (delete). For each record in the audit log table, the operation type is +I (insert).

For example, you can use the following SQL statement to query the audit log table created for the most recent snapshot of the Paimon table named mycat.mydb.mytbl:

SELECT * FROM mycat.mydb.`mytbl$audit_log`;

References

For information about the complete structure of each system table, see Apache Paimon official documentation.