Apache Paimon (Paimon) uses system tables to store the metadata and data consumption information of each Paimon table. This topic describes the columns in different system tables.
System tables related to table metadata
Snapshots table
A snapshots table allows you to query information about each snapshot file of a Paimon table, such as the ID and creation time of the snapshot file.
For example, you can use the following SQL statement to query the snapshots table that is created for the Paimon table named mycat.mydb.mytbl
:
SELECT * FROM mycat.mydb.`mytbl$snapshots`;
The following table describes the columns in a snapshots table.
Column | Data type | Description |
snapshot_id | Long | The ID of the snapshot file. |
schema_id | Long | The ID of the schema file used by the snapshot file. You can obtain the schema information from the schemas table. |
commit_time | Timestamp | The time when the snapshot file was created. |
total_record_count | Long | The total number of records in the data files to which the snapshot file points. Note Values in this column do not indicate the number of data records logically stored in the Paimon table. This is because data files are compacted in memory before they are captured in a snapshot file. |
delta_record_count | Long | The number of added data records compared with the previous snapshot file. |
changelog_record_count | Long | The number of generated changelog records compared with the previous snapshot file. |
Schemas table
A schemas table allows you to query the current and historical schema information of a Paimon table. If you modify the schema of a Paimon table by using the ALTER TABLE, CREATE TABLE AS (CTAS), or CREATE DATABASE AS(CDAS) statement, each modification generates a record in the schemas table. For example, you can use the following SQL statement to query the schemas table created for the Paimon table named mycat.mydb.mytbl
:
SELECT * FROM mycat.mydb.`mytbl$schemas`;
The following table describes the columns in a schemas table.
Column | Data type | Description |
schema_id | Long | The ID of the schema. |
fields | String | The name and data type of each column. |
partition_keys | String | The name of the partition key. |
primary_keys | String | The name of the primary key. |
options | String | The values of the table options. |
comment | String | The comment added to the table to provide additional information. |
update_time | Timestamp | The time when the schema was last modified. |
Options table
An options table allows you to query the current configuration of table options.
For example, you can use the following SQL statement to query the options table created for the Paimon table named mycat.mydb.mytbl
:
SELECT * FROM mycat.mydb.`mytbl$options`;
The following table describes the columns in an options table.
Column | Data type | Description |
key | String | The name of the table option. |
value | String | The value of the table option. |
If a table option is not included in the table, the option is set to the default value.
Partitions table
A partitions table allows you to query the partitions in a Paimon table, the total number of data records in each partition, and the total size of files in each partition.
For example, you can use the following SQL statement to query the partitions table created for the Paimon table named mycat.mydb.mytbl
:
SELECT * FROM mycat.mydb.`mytbl$partitions`;
The following table describes the columns in a partitions table.
Column | Data type | Description |
partition | String | The partition in the |
record_count | Long | The total number of data records in the partition. Note Values in this column do not indicate the number of data records logically stored in the partition. This is because data files are compacted in memory before they are captured in a snapshot file. |
file_size_in_bytes | Long | The total size of files in the partition. Unit: bytes. Only data files to which the current snapshot file points are counted. |
Files table
A files table allows you to query all data files to which a snapshot file points, including the file format, the number of records in the file, and the file size.
For example, you can use the following SQL statement to query the files table created for the most recent snapshot of the Paimon table named mycat.mydb.mytbl
:
SELECT * FROM mycat.mydb.`mytbl$files`;
You can also query the files table created for a specific snapshot. In the following example, the snapshot ID is set to 5 for the Paimon table named mycat.mydb.mytbl
.
SELECT * FROM mycat.mydb.`mytbl$files` /*+ OPTIONS('scan.snapshot-id'='5') */;
The following table describes the columns in a files table.
Column | Data type | Description |
partition | String | The partition that contains the file. Format: |
bucket | Integer | The bucket that contains the file. This column applies only to primary key tables that use the fixed bucket mode. |
file_path | String | The path of the file. |
file_format | String | The format of the file. |
schema_id | Long | The ID of the schema file used by the snapshot file. You can obtain the schema information from the schemas table. |
level | Integer | The log-structured merge-tree (LSM) level of the file. This column applies only to primary key tables. A value of 0 indicates a small file. You can query the number of small files in a bucket to monitor the compaction progress in the bucket. |
record_count | Long | The number of records in the file. |
file_size_in_bytes | Long | The size of the file. Unit: bytes. |
The files table does not contain information about historical data files that are not pointed to by the queried snapshot file.
Tags table
A tags table allows you to query information about tags of a Paimon table, such as the name of a tag and the snapshot associated with the tag.
For example, you can use the following SQL statement to query the tags table created for the Paimon table named mycat.mydb.mytbl
:
SELECT * FROM mycat.mydb.`mytbl$tags`;
The following table describes the columns in a tags table.
Column | Data type | Description |
tag_name | String | The name of the tag. |
snapshot_id | Long | The ID of the snapshot associated with the tag. |
schema_id | Long | The ID of the schema used by the tag. You can obtain the schema information from the schemas table. |
commit_time | Timestamp | The time when the snapshot associated with the tag was created. |
record_count | Long | The number of records in the data files. |
System tables related to data consumption
Read-optimized table
The data files of a Paimon table must be compacted in the memory before they are available for consumption, which affects the read efficiency. If you use a primary key table and want to improve the efficiency of batch reading or ad hoc OLAP queries, you can consume data from the corresponding read-optimized table. This way, only files that do not require compaction are read, which eliminates the compaction process and improves query efficiency. Take note that the read-optimized table generates data with increased latency because small files are compacted less frequently. You can configure the following parameter to compact small files at a regular interval and strike a balance between write efficiency, read efficiency, and data latency.
Parameter | Description | Data Type | Default value | Notes |
compaction.optimization-interval | The interval at which small files are compacted. | Duration | None | Frequent compaction of small files may affect the write efficiency. We recommend that you set this parameter to a value greater than |
For example, you can use the following SQL statement to query the read-optimized table created for the most recent snapshot of the Paimon table named mycat.mydb.mytbl
:
SELECT * FROM mycat.mydb.`mytbl$ro`;
Audit log table
If you need to know the specific operation of each record in a Paimon table, you can consume data from the corresponding audit log table. Compared with the original Paimon table, the audit log table inserts a column named rowkind at the beginning of each record to store the operation type. Valid values in the rowkind column are +I (insert), -U (update before), +U (update after), and -D (delete). For each record in the audit log table, the operation type is +I (insert).
For example, you can use the following SQL statement to query the audit log table created for the most recent snapshot of the Paimon table named mycat.mydb.mytbl
:
SELECT * FROM mycat.mydb.`mytbl$audit_log`;
References
For information about the complete structure of each system table, see Apache Paimon official documentation.