System tables - Realtime Compute for Apache Flink - Alibaba Cloud Documentation Center

Apache Paimon (Paimon) uses system tables to store the metadata and data consumption information of each Paimon table. This topic describes the columns in different system tables.

System tables related to table metadata

Snapshots table

A snapshots table allows you to query information about each snapshot file of a Paimon table, such as the ID and creation time of the snapshot file.

For example, you can use the following SQL statement to query the snapshots table of the mycatalog.mydb.mytbl Paimon table.

SELECT * FROM mycatalog.mydb.`mytbl$snapshots`;

The following table describes the columns in a snapshots table.

Column	Data type	Description
snapshot_id	Long	The ID of the snapshot file.
schema_id	Long	The ID of the schema file used by the snapshot file. You can obtain the schema information from the schemas table.
commit_time	Timestamp	The time when the snapshot file was created.
total_record_count	Long	The total number of records in the data files to which the snapshot file points. Note Values in this column do not indicate the number of data records logically stored in the Paimon table. This is because data files are compacted in memory before they are captured in a snapshot file.
delta_record_count	Long	The number of added data records compared with the previous snapshot file.
changelog_record_count	Long	The number of generated changelog records compared with the previous snapshot file.

Schemas table

A schemas table allows you to query the current and historical schema information of a Paimon table. If you modify the schema of a Paimon table by using the ALTER TABLE, CREATE TABLE AS, or CREATE DATABASE AS statement, each modification generates a record in the schemas table. For example, you can use the following SQL statement to query the schemas table of the mycatalog.mydb.mytbl Paimon table:

SELECT * FROM mycatalog.mydb.`mytbl$schemas`;

The following table describes the columns in a schemas table.

Column	Data type	Description
schema_id	Long	The ID of the schema.
fields	String	The name and data type of each column.
partition_keys	String	The name of the partition key.
primary_keys	String	The name of the primary key.
options	String	The values of the table options.
comment	String	The comment added to the table to provide additional information.
update_time	Timestamp	The time when the schema was last modified.

Options table

An options table allows you to query the current configuration of table options.

For example, you can use the following SQL statement to query the options table of the mycatalog.mydb.mytbl Paimon table.

SELECT * FROM mycatalog.mydb.`mytbl$options`;

The following table describes the columns in an options table.

Column	Data type	Description
key	String	The name of the table option.
value	String	The value of the table option.

Note

If a table option is not included in the table, the option is set to the default value.

Partitions table

A partitions table allows you to query the partitions in a Paimon table, the total number of data records in each partition, and the total size of files in each partition.

For example, you can use the following SQL statement to query the partitions table of the mycatalog.mydb.mytbl Paimon table:

SELECT * FROM mycatalog.mydb.`mytbl$partitions`;

The following table describes the columns in a partitions table.

Column	Data type	Description
partition	String	The partition in the `[partition value 1, partition value 2, ...]` format.
record_count	Long	The total number of data records in the partition. Note Values in this column do not indicate the number of data records logically stored in the partition. This is because data files are compacted in memory before they are captured in a snapshot file.
file_size_in_bytes	Long	The total size of files in the partition. Unit: bytes. Only data files to which the current snapshot file points are counted.

Files table

A files table allows you to query all data files to which a snapshot file points, including the file format, the number of records in the file, and the file size.

For example, you can use the following SQL statement to query the files table created for the most recent snapshot of the mycatalog.mydb.mytbl Paimon table:

SELECT * FROM mycatalog.mydb.`mytbl$files`;

You can also query the files table created for a specific snapshot of the mycatalog.mydb.mytbl Paimon table. You can use the following SQL statement to query the files table created for the snapshot whose ID is 5.

SELECT * FROM mycatalog.mydb.`mytbl$files` /*+ OPTIONS('scan.snapshot-id'='5') */;

The following table describes the columns in a files table.

Column	Data type	Description
partition	String	The partition that contains the file. Format: `[partition value 1, partition value 2, ...]`.
bucket	Integer	The bucket that contains the file. This column applies only to primary key tables that use the fixed bucket mode.
file_path	String	The path of the file.
file_format	String	The format of the file.
schema_id	Long	The ID of the schema file used by the snapshot file. You can obtain the schema information from the schemas table.
level	Integer	The log-structured merge-tree (LSM) level of the file. This column applies only to primary key tables. A value of 0 indicates a small file. You can query the number of small files in a bucket to monitor the compaction progress in the bucket.
record_count	Long	The number of records in the file.
file_size_in_bytes	Long	The size of the file. Unit: bytes.

Note

The files table does not contain information about historical data files that are not pointed to by the queried snapshot file.

Tags table

A tags table allows you to query information about tags of a Paimon table, such as the name of a tag and the snapshot associated with the tag.

For example, you can use the following SQL statement to query the tags table of the mycatalog.mydb.mytbl Paimon table.

SELECT * FROM mycatalog.mydb.`mytbl$tags`;

The following table describes the columns in a tags table.

Column	Data type	Description
tag_name	String	The name of the tag.
snapshot_id	Long	The ID of the snapshot associated with the tag.
schema_id	Long	The ID of the schema used by the tag. You can obtain the schema information from the schemas table.
commit_time	Timestamp	The time when the snapshot associated with the tag was created.
record_count	Long	The number of records in the data files.

System tables related to data consumption

Read-optimized table

The data files of a Paimon table must be compacted in the memory before they are available for consumption, which affects the read efficiency. If you use a primary key table and want to improve the efficiency of batch reading or ad hoc OLAP queries, you can consume data from the corresponding read-optimized table. This way, only files that do not require compaction are read, which eliminates the compaction process and improves query efficiency. Take note that the read-optimized table generates data with increased latency because small files are compacted less frequently. You can configure the following parameter to compact small files at a regular interval and strike a balance between write efficiency, read efficiency, and data latency.

Parameter	Description	Data type	Default value	Remarks
compaction.optimization-interval	The interval at which small files are compacted.	Duration	None	Frequent compaction of small files may affect the write efficiency. We recommend that you set this parameter to a value greater than `30 min`.

For example, you can use the following SQL statement to query the read-optimized table created for the most recent snapshot of the mycatalog.mydb.mytbl Paimon table:

SELECT * FROM mycatalog.mydb.`mytbl$ro`;

Audit log table

If you need to know the specific operation of each record in a Paimon table, you can consume data from the corresponding audit log table. Compared with the original Paimon table, the audit log table inserts a column named rowkind at the beginning of each record to store the operation type. Valid values in the rowkind column are +I (insert), -U (update before), +U (update after), and -D (delete). For each record in the audit log table, the operation type is +I (insert).

For example, you can use the following SQL statement to query the audit log table created for the most recent snapshot of the mycatalog.mydb.mytbl Paimon table:

SELECT * FROM mycatalog.mydb.`mytbl$audit_log`;

References

For information about the complete structure of each system table, see Apache Paimon official documentation.