Hologres provides the binary logging feature to allow you to consume Hologres binary logs. This helps improve the data replication capability and shorten end-to-end latency of data processing. You can use the binary logging feature if you want to capture Hologres database events for data replication or synchronization, or if you want to forward Hologres database events to consumers as messages. This topic describes how to subscribe to Hologres binary logs and related operations.
Introduction to binary logs
Similar to traditional MySQL databases, Hologres uses binary logs to log all data change events of databases. Hologres binary logs make data replication and synchronization convenient and flexible. However, Hologres binary logs are used only for data synchronization in most cases. Binary logs of traditional databases are also used in high-availability scenarios such as synchronization between primary and secondary instances and data restoration scenarios. The implementation of Hologres binary logs and that of traditional database binary logs differ in the following aspects:
Hologres binary logs do not record data definition language (DDL) operations.
You can enable and disable the binary logging feature for a Hologres table based on your business requirements. You can also configure different time to live (TTL) settings for different tables.
Binary logs also support the distribution configuration of Hologres.
Hologres binary logs can be easily queried.
In big data scenarios, you can use Flink to directly consume Hologres binary logs. Compared with traditional data warehouse layering, Flink and Hologres binary logs can implement end-to-end real-time processing jobs from the Operational Data Store (ODS) layer to the Data Warehouse Detail (DWD) layer and from the DWD layer to the Data Warehouse Summary (DWS) layer. This meets the requirement of unified storage with layered governance. This also helps improve data reuse capabilities, shorten the end-to-end latency of data processing, and provide users with a one-stop real-time data warehouse solution.
Limits
Before you subscribe to binary logs in Hologres, take note of the following items:
Only Hologres V0.9 and later support binary log subscription. If the version of your Hologres instance is earlier than V0.9, join the Hologres DingTalk group for technical support. For more information about how to join the Hologres DingTalk group, see Obtain online support for Hologres.
In Hologres V0.9 and V0.10, you cannot enable binary logging for an existing table by modifying table properties. To enable binary logging for a table, you must enable the feature when you create the table. In Hologres V1.1 and later, you can enable binary logging for existing tables based on your business requirements.
For Hologres instances of minor versions earlier than V1.3.14 or Hologres instances of all versions earlier than V1.1.82, only a superuser can consume Hologres binary logs. If you use an account with lower permissions to consume Hologres binary logs, the error message
"permission denied for table hg_replication_slot_properties"
is returned. In Hologres V1.3.14 and all later versions or V1.1.82 and later minor versions, if Flink is used to consume Hologres binary logs, you can use accounts with query permissions. If you use the Java Database Connectivity (JDBC) driver to consume Hologres binary logs, you must use accounts that have the permissions of the replication role.In Hologres, you can subscribe to binary logs of a single row-oriented or column-oriented table. The following table describes the support for different subscription modes.
Flink type
Subscription to binary logs of a Hologres row-oriented table
Subscription to binary logs of a Hologres column-oriented table
Subscription to binary logs of a Hologres row-column hybrid table (supported in Hologres V1.1 and later)
Blink-based Realtime Compute
Supported
Supported
Supported
Fully managed Flink
Supported
Supported
Supported
Open source Flink
Not supported
Not supported
Not supported
JDBC
Supported from Hologres V1.1
Supported from Hologres V1.1
Supported from Hologres V1.1
Blink-based Realtime Compute does not support Hologres binary logs that contain data of the TIMESTAMP type. When you create a table in Hologres, specify the TIMESTAMPTZ data type for related fields. In addition, Blink-based Realtime Compute does not support Hologres binary logs that contain data of special types such as SMALLINT.
You cannot subscribe to the binary logs of a parent table. You can subscribe to the binary logs of a child table or a non-partitioned table. In Hologres V1.3.24 and later, you can configure the TTL of binary logs for child tables based on your business requirements. If you do not explicitly specify the TTL for the binary logs of a child table, the TTL for the binary logs of the parent table is used as that for the binary logs of the child table. The TTL of binary logs is not accurate. The system does not ensure that binary logs are deleted immediately after they expire, but deletes binary logs at a specific time after they expire.
Theoretically, the overhead of binary logging for column-oriented tables is higher than that for row-oriented tables. If your tables are frequently updated, we recommend that you enable binary logging for row-oriented tables.
You can enable binary logging only for Hologres internal tables. You cannot enable binary logging for foreign tables.
Binary log format and principle
Fields in binary logs consist of system fields and user-defined table fields. The following table describes the fields.
Field | Data type | Description |
hg_binlog_lsn | BIGINT | The ordinal number of the current binary log. This field is a system field. The values of this field in binary logs in the same shard monotonically increase and may be discontinuous. The values of this field in binary logs in different shards may be identical and out of order. |
hg_binlog_event_type | BIGINT | The type of operation recorded in the current binary log. This field is a system field.
|
hg_binlog_timestamp_us | BIGINT | The timestamp of the system. Unit: microseconds. This field is a system field. |
user_table_column_1 | User-defined | A user-defined table field. |
... | ... | ... |
user_table_column_n | User-defined | A user-defined table field. |
An UPDATE operation generates two records in binary logs, which indicate a record before the update and a record after the update. Binary logging ensures that the ordinal numbers of the two records are continuous and in the right order. The record that is generated after the update comes after the record that is generated before the update.
The same event of an UPDATE operation is recorded as a different value if you perform the UPDATE operation by using a Hologres connector, such as Holo Client, the Flink connector, or Data Integration of DataWorks. For example, if you use a Hologres connector, the value BEFORE_UPDATE for the hg_binlog_event_type field is changed to DELETE, and the value AFTER_UPDATE for the field is changed to INSERT. As a result, values 2 and 5 are displayed for the
hg_binlog_event_type
field. The differences are not perceived by users.The values BEFORE_UPDATE and AFTER_UPDATE are available for the
hg_binlog_event_type
field only if you perform an UPDATE operation by using SQL statements.
You can consider binary logs of Hologres tables as special row-oriented tables. If you enable the binary logging feature for a business table, a row-oriented table is created for the table. In the row-oriented table, the hg_binlog_lsn
field is the key, and the user-defined fields in the business table and the hg_binlog_event_type
and hg_binlog_timestamp_us
fields are the values. The fields in binary logs are fixed with strong schemas. The order of user-defined fields is the same as that defined in the DDL statement of the business table. Therefore, we recommend that you use the row-oriented storage mode or row-column hybrid storage mode for tables on which you want to enable the binary logging feature. This provides better performance in reading binary logs.
Enable binary logging
By default, binary logging is disabled for Hologres tables. You can enable this feature for a Hologres table by configuring the table properties binlog.level and binlog.ttl. The following sample code provides an example of how to enable binary logging. For more information about the parameters for creating a table, see Overview.
Theoretically, the overhead of binary logging for column-oriented tables is higher than that for row-oriented tables. If your tables are frequently updated and you want to enable binary logging for your tables, we recommend that you use the row-oriented storage mode for your tables.
Syntax supported in Hologres V2.1 and later:
In Hologres V2.1 and later, the
binlog.level
andbinlog.ttl
properties are changed to thebinlog_level
andbinlog_ttl
properties respectively.CREATE TABLE test_message_src ( id int PRIMARY KEY, title text NOT NULL, body text ) WITH ( orientation = 'row', clustering_key = 'id', binlog_level = 'replica', binlog_ttl = '86400' -- Specify the TTL of binary logs. Unit: seconds. );
Syntax supported in all Hologres versions:
begin; create table test_message_src( id int primary key, title text not null, body text); call set_table_property('test_message_src', 'orientation', 'row'); -- Create a row-oriented table named test_message_src. call set_table_property('test_message_src', 'clustering_key', 'id'); -- Configure the id column as the clustering key. call set_table_property('test_message_src', 'binlog.level', 'replica'); -- Configure the binlog.level property to enable binary logging. call set_table_property('test_message_src', 'binlog.ttl', '86400'); -- Specify the TTL of binary logs. Unit: seconds. commit;
The following table describes the properties.
Property | Description |
| Specifies whether to enable binary logging. Valid values:
|
| The TTL of binary logs. Unit: seconds. Default value: 2592000, which indicates 30 days. |
Configure the binary logging feature for existing tables based on your business requirements
In Hologres V1.1 and later, you can enable or disable binary logging for existing tables based on your business requirements. To enable binary logging for an existing table, you do not need to create the table again. You can also specify the TTL of binary logs to retain binary logs for a specific period of time. This meets the requirements for a limited retention period in specific business scenarios.
The operations described in this section apply only to Hologres V1.1 and later. If the version of your Hologres instance is earlier than V1.1, manually upgrade your Hologres instance in the Hologres console or join the Hologres DingTalk group for technical support. For more information about how to manually upgrade your Hologres instance in the Hologres console, see Manual upgrade (beta). For more information about how to obtain technical support, see Obtain online support for Hologres.
Enable binary logging
You can execute the following statements to enable binary logging for an existing table and specify the TTL of binary logs:
-- Configure the binlog.level property to enable binary logging. begin; call set_table_property('<table_name>', 'binlog.level', 'replica'); commit; -- Configure the binlog.ttl property to specify the TTL of binary logs. Unit: seconds. begin; call set_table_property('<table_name>', 'binlog.ttl', '2592000'); commit;
The table_name parameter specifies the name of the table for which you want to enable binary logging.
Disable binary logging
You can execute the following statements to disable binary logging for a table:
-- Configure the binlog.level property to disable binary logging. begin; call set_table_property('<table_name>', 'binlog.level', 'none'); commit;
The table_name parameter specifies the name of the table for which you want to disable binary logging.
Change the TTL of binary logs
You can execute the following statements to change the TTL of binary logs for a table. This way, you can specify an appropriate log retention period based on your business requirements.
NoteIn Hologres V1.3.24 and later, you can configure the TTL of binary logs for child tables based on your business requirements. If you do not specify the TTL for the binary logs of a child table, the TTL for the binary logs of the parent table is used as that for the binary logs of the child table by default.
call set_table_property('<table_name>', 'binlog.ttl', '8640000'); -- Unit: seconds.
The table_name parameter specifies the name of the table for which you want to change the TTL of binary logs.
Query binary logs
Hologres binary logs use a strong schema. To query the binary logs of a table, you can use a combination of the additional fields in the binary logs and the fields in the table to perform the query. In addition, Hologres provides functions for you to query the latest or earliest binary log, or query information about binary logs based on the log sequence number (LSN) or timestamp.
Query binary logs based on special built-in fields
You can use a combination of the additional fields in the binary logs and the fields in the table to perform a query. Sample statement:
SELECT hg_binlog_lsn,hg_binlog_event_type,hg_binlog_timestamp_us,* FROM test_message_src;
The following figure shows the query result.
Query the earliest or latest binary log of a shard in a table
You can use the hg_get_binlog_cursor
function to perform the query. Syntax:
-- Query the earliest binary log of a shard.
SELECT * FROM hg_get_binlog_cursor('<table_name>','OLDEST',<shard_id>);
-- Query the latest binary log of a shard.
SELECT * FROM hg_get_binlog_cursor('<table_name>','LATEST',<shard_id>);
The following statement shows an example.
SELECT * FROM hg_get_binlog_cursor('test_message_src','OLDEST',0);
The following figure shows the query result.
Query the timestamp of a binary log based on the LSN of the binary log
You can use the hg_get_binlog_cursor_by_lsn
function to query the timestamp of a binary log. This function returns the timestamp of the first binary log whose LSN is greater than or equal to the value specified in the query. If the LSN is invalid, this function returns the current timestamp as the value of the hg_binlog_timestamp_us field. Syntax:
SELECT * FROM hg_get_binlog_cursor_by_lsn('<table_name>',<lsn>,<shard_id>);-- Specify an LSN of the BIGINT type.
The following statement shows an example.
SELECT * FROM hg_get_binlog_cursor_by_lsn('test_message_src',152,0);
The following figure shows the query result.
Query the LSN of a binary log based on the timestamp of the binary log
You can use the hg_get_binlog_cursor_by_timestamp
function to query the LSN of a binary log. This function returns the LSN of the first binary log whose timestamp value is greater than or equal to the value specified in the query. If the timestamp is the latest, this function returns the current timestamp as the value of the hg_binlog_timestamp_us
field and the LSN to be assigned to the next data record as the value of the hg_binlog_lsn
field. Syntax:
If the timestamp specified in the query is later than the current time, the error message "get binlog cursor in future time" is returned. You can use the now() function to query the current time.
SELECT * FROM hg_get_binlog_cursor_by_timestamp('<table_name>',<timestamp>,<shard_id>);
Sample statement:
SELECT *,to_timestamp(hg_binlog_timestamp_us/1000000.0) FROM hg_get_binlog_cursor_by_timestamp('test_message_src','2024-05-20 19:34:53.791+08',0);
The following figure shows the query result.
Consume Hologres binary logs in real time
You can use Flink, Blink, JDBC, or Holo Client to consume Hologres binary logs. For more information, see the following topics:
Check the tables for which binary logging is enabled
You can execute the following SQL statement to check the tables for which binary logging is enabled:
SELECT
*
FROM
hologres.hg_table_properties
WHERE
property_key = 'binlog.level'
AND property_value = 'replica';
The following figure shows the returned result.
Query the storage space of binary logs
Hologres provides the
pg_relation_size
function for you to query the storage size of a table for which binary logging is enabled. The size includes the storage size of binary logs. For more information, see Query the storage size of a table in the "Query the storage sizes of tables and databases" topic.Hologres V2.1 and later provide the
hologres.hg_relation_size
function for you to query storage details of a table, including the storage size of data and the storage size of binary logs. For more information, see Query the storage details of a table in the "Query the storage sizes of tables and databases" topic.
Disable binary logging for table-related DML statements
You can disable binary logging for data manipulation language (DML) statements of a table by using a Grand Unified Configuration (GUC) parameter. You must add the following SET command before the DML statement and commit them at the session level.
-- Run the following SET command at the session level:
SET hg_experimental_generate_binlog=off;