All Products
Search
Document Center

Hologres:Subscribe to Hologres binary logs

Last Updated:Sep 23, 2024

Hologres provides the binary logging feature to allow you to consume Hologres binary logs. This helps improve the data replication capability and shorten end-to-end latency of data processing. You can use the binary logging feature if you want to capture Hologres database events for data replication or synchronization, or if you want to forward Hologres database events to consumers as messages. This topic describes how to subscribe to Hologres binary logs and related operations.

Introduction to binary logs

Similar to traditional MySQL databases, Hologres uses binary logs to log all data change events of databases. Hologres binary logs make data replication and synchronization convenient and flexible. However, Hologres binary logs are used only for data synchronization in most cases. Binary logs of traditional databases are also used in high-availability scenarios such as synchronization between primary and secondary instances and data restoration scenarios. The implementation of Hologres binary logs and that of traditional database binary logs differ in the following aspects:

  • Hologres binary logs do not record data definition language (DDL) operations.

  • You can enable and disable the binary logging feature for a Hologres table based on your business requirements. You can also configure different time to live (TTL) settings for different tables.

  • Binary logs also support the distribution configuration of Hologres.

  • Hologres binary logs can be easily queried.

In big data scenarios, you can use Flink to directly consume Hologres binary logs. Compared with traditional data warehouse layering, Flink and Hologres binary logs can implement end-to-end real-time processing jobs from the Operational Data Store (ODS) layer to the Data Warehouse Detail (DWD) layer and from the DWD layer to the Data Warehouse Summary (DWS) layer. This meets the requirement of unified storage with layered governance. This also helps improve data reuse capabilities, shorten the end-to-end latency of data processing, and provide users with a one-stop real-time data warehouse solution.

Limits

Before you subscribe to binary logs in Hologres, take note of the following items:

  • Only Hologres V0.9 and later support binary log subscription. If the version of your Hologres instance is earlier than V0.9, join the Hologres DingTalk group for technical support. For more information about how to join the Hologres DingTalk group, see Obtain online support for Hologres.

  • In Hologres V0.9 and V0.10, you cannot enable binary logging for an existing table by modifying table properties. To enable binary logging for a table, you must enable the feature when you create the table. In Hologres V1.1 and later, you can enable binary logging for existing tables based on your business requirements.

  • For Hologres instances of minor versions earlier than V1.3.14 or Hologres instances of all versions earlier than V1.1.82, only a superuser can consume Hologres binary logs. If you use an account with lower permissions to consume Hologres binary logs, the error message "permission denied for table hg_replication_slot_properties" is returned. In Hologres V1.3.14 and all later versions or V1.1.82 and later minor versions, if Flink is used to consume Hologres binary logs, you can use accounts with query permissions. If you use the Java Database Connectivity (JDBC) driver to consume Hologres binary logs, you must use accounts that have the permissions of the replication role.

  • In Hologres, you can subscribe to binary logs of a single row-oriented or column-oriented table. The following table describes the support for different subscription modes.

    Flink type

    Subscription to binary logs of a Hologres row-oriented table

    Subscription to binary logs of a Hologres column-oriented table

    Subscription to binary logs of a Hologres row-column hybrid table (supported in Hologres V1.1 and later)

    Blink-based Realtime Compute

    Supported

    Supported

    Supported

    Fully managed Flink

    Supported

    Supported

    Supported

    Open source Flink

    Not supported

    Not supported

    Not supported

    JDBC

    Supported from Hologres V1.1

    Supported from Hologres V1.1

    Supported from Hologres V1.1

  • Blink-based Realtime Compute does not support Hologres binary logs that contain data of the TIMESTAMP type. When you create a table in Hologres, specify the TIMESTAMPTZ data type for related fields. In addition, Blink-based Realtime Compute does not support Hologres binary logs that contain data of special types such as SMALLINT.

  • You cannot subscribe to the binary logs of a parent table. You can subscribe to the binary logs of a child table or a non-partitioned table. In Hologres V1.3.24 and later, you can configure the TTL of binary logs for child tables based on your business requirements. If you do not explicitly specify the TTL for the binary logs of a child table, the TTL for the binary logs of the parent table is used as that for the binary logs of the child table. The TTL of binary logs is not accurate. The system does not ensure that binary logs are deleted immediately after they expire, but deletes binary logs at a specific time after they expire.

  • Theoretically, the overhead of binary logging for column-oriented tables is higher than that for row-oriented tables. If your tables are frequently updated, we recommend that you enable binary logging for row-oriented tables.

  • You can enable binary logging only for Hologres internal tables. You cannot enable binary logging for foreign tables.

Binary log format and principle

Fields in binary logs consist of system fields and user-defined table fields. The following table describes the fields.

Field

Data type

Description

hg_binlog_lsn

BIGINT

The ordinal number of the current binary log. This field is a system field. The values of this field in binary logs in the same shard monotonically increase and may be discontinuous. The values of this field in binary logs in different shards may be identical and out of order.

hg_binlog_event_type

BIGINT

The type of operation recorded in the current binary log. This field is a system field.

  • The hg_binlog_event_type field has the following valid values:

    • INSERT=5, which indicates that a record is inserted.

    • DELETE=2, which indicates that a record is deleted.

    • BEFORE_UPDATE=3, which indicates that a record is saved before it is updated.

    • AFTER_UPDATE=7, which indicates that a record is saved after it is updated.

hg_binlog_timestamp_us

BIGINT

The timestamp of the system. Unit: microseconds. This field is a system field.

user_table_column_1

User-defined

A user-defined table field.

...

...

...

user_table_column_n

User-defined

A user-defined table field.

  • An UPDATE operation generates two records in binary logs, which indicate a record before the update and a record after the update. Binary logging ensures that the ordinal numbers of the two records are continuous and in the right order. The record that is generated after the update comes after the record that is generated before the update.

  • The same event of an UPDATE operation is recorded as a different value if you perform the UPDATE operation by using a Hologres connector, such as Holo Client, the Flink connector, or Data Integration of DataWorks. For example, if you use a Hologres connector, the value BEFORE_UPDATE for the hg_binlog_event_type field is changed to DELETE, and the value AFTER_UPDATE for the field is changed to INSERT. As a result, values 2 and 5 are displayed for the hg_binlog_event_type field. The differences are not perceived by users.

  • The values BEFORE_UPDATE and AFTER_UPDATE are available for the hg_binlog_event_type field only if you perform an UPDATE operation by using SQL statements.

You can consider binary logs of Hologres tables as special row-oriented tables. If you enable the binary logging feature for a business table, a row-oriented table is created for the table. In the row-oriented table, the hg_binlog_lsn field is the key, and the user-defined fields in the business table and the hg_binlog_event_type and hg_binlog_timestamp_us fields are the values. The fields in binary logs are fixed with strong schemas. The order of user-defined fields is the same as that defined in the DDL statement of the business table. Therefore, we recommend that you use the row-oriented storage mode or row-column hybrid storage mode for tables on which you want to enable the binary logging feature. This provides better performance in reading binary logs.

Enable binary logging

By default, binary logging is disabled for Hologres tables. You can enable this feature for a Hologres table by configuring the table properties binlog.level and binlog.ttl. The following sample code provides an example of how to enable binary logging. For more information about the parameters for creating a table, see Overview.

Note

Theoretically, the overhead of binary logging for column-oriented tables is higher than that for row-oriented tables. If your tables are frequently updated and you want to enable binary logging for your tables, we recommend that you use the row-oriented storage mode for your tables.

  • Syntax supported in Hologres V2.1 and later:

    In Hologres V2.1 and later, the binlog.level and binlog.ttl properties are changed to the binlog_level and binlog_ttl properties respectively.

    CREATE TABLE test_message_src (
        id int PRIMARY KEY,
        title text NOT NULL,
        body text
    )
    WITH (
        orientation = 'row',
        clustering_key = 'id',
        binlog_level = 'replica',
        binlog_ttl = '86400' -- Specify the TTL of binary logs. Unit: seconds.
    );
  • Syntax supported in all Hologres versions:

    begin;
    create table test_message_src(
      id int primary key, 
      title text not null, 
      body text);
    call set_table_property('test_message_src', 'orientation', 'row'); -- Create a row-oriented table named test_message_src.
    call set_table_property('test_message_src', 'clustering_key', 'id'); -- Configure the id column as the clustering key.
    call set_table_property('test_message_src', 'binlog.level', 'replica'); -- Configure the binlog.level property to enable binary logging.
    call set_table_property('test_message_src', 'binlog.ttl', '86400'); -- Specify the TTL of binary logs. Unit: seconds.
    commit;

The following table describes the properties.

Property

Description

binlog_level or binlog.level

Specifies whether to enable binary logging. Valid values:

  • replica: enables binary logging.

  • none: disables binary logging.

binlog_ttl or binlog.ttl

The TTL of binary logs. Unit: seconds. Default value: 2592000, which indicates 30 days.

Configure the binary logging feature for existing tables based on your business requirements

In Hologres V1.1 and later, you can enable or disable binary logging for existing tables based on your business requirements. To enable binary logging for an existing table, you do not need to create the table again. You can also specify the TTL of binary logs to retain binary logs for a specific period of time. This meets the requirements for a limited retention period in specific business scenarios.

Note

The operations described in this section apply only to Hologres V1.1 and later. If the version of your Hologres instance is earlier than V1.1, manually upgrade your Hologres instance in the Hologres console or join the Hologres DingTalk group for technical support. For more information about how to manually upgrade your Hologres instance in the Hologres console, see Manual upgrade (beta). For more information about how to obtain technical support, see Obtain online support for Hologres.

  • Enable binary logging

    You can execute the following statements to enable binary logging for an existing table and specify the TTL of binary logs:

    -- Configure the binlog.level property to enable binary logging.
    begin;
    call set_table_property('<table_name>', 'binlog.level', 'replica');
    commit;
    
    -- Configure the binlog.ttl property to specify the TTL of binary logs. Unit: seconds.
    begin;
    call set_table_property('<table_name>', 'binlog.ttl', '2592000');
    commit;

    The table_name parameter specifies the name of the table for which you want to enable binary logging.

  • Disable binary logging

    You can execute the following statements to disable binary logging for a table:

    -- Configure the binlog.level property to disable binary logging.
    begin; 
    call set_table_property('<table_name>', 'binlog.level', 'none'); 
    commit; 

    The table_name parameter specifies the name of the table for which you want to disable binary logging.

  • Change the TTL of binary logs

    You can execute the following statements to change the TTL of binary logs for a table. This way, you can specify an appropriate log retention period based on your business requirements.

    Note

    In Hologres V1.3.24 and later, you can configure the TTL of binary logs for child tables based on your business requirements. If you do not specify the TTL for the binary logs of a child table, the TTL for the binary logs of the parent table is used as that for the binary logs of the child table by default.

    call set_table_property('<table_name>', 'binlog.ttl', '8640000'); -- Unit: seconds.

    The table_name parameter specifies the name of the table for which you want to change the TTL of binary logs.

Query binary logs

Hologres binary logs use a strong schema. To query the binary logs of a table, you can use a combination of the additional fields in the binary logs and the fields in the table to perform the query. In addition, Hologres provides functions for you to query the latest or earliest binary log, or query information about binary logs based on the log sequence number (LSN) or timestamp.

Query binary logs based on special built-in fields

You can use a combination of the additional fields in the binary logs and the fields in the table to perform a query. Sample statement:

SELECT hg_binlog_lsn,hg_binlog_event_type,hg_binlog_timestamp_us,* FROM test_message_src;

The following figure shows the query result.

image

Query the earliest or latest binary log of a shard in a table

You can use the hg_get_binlog_cursor function to perform the query. Syntax:

-- Query the earliest binary log of a shard.
SELECT * FROM hg_get_binlog_cursor('<table_name>','OLDEST',<shard_id>);

-- Query the latest binary log of a shard.
SELECT * FROM hg_get_binlog_cursor('<table_name>','LATEST',<shard_id>);

The following statement shows an example.

SELECT * FROM hg_get_binlog_cursor('test_message_src','OLDEST',0);

The following figure shows the query result.

image

Query the timestamp of a binary log based on the LSN of the binary log

You can use the hg_get_binlog_cursor_by_lsn function to query the timestamp of a binary log. This function returns the timestamp of the first binary log whose LSN is greater than or equal to the value specified in the query. If the LSN is invalid, this function returns the current timestamp as the value of the hg_binlog_timestamp_us field. Syntax:

SELECT * FROM hg_get_binlog_cursor_by_lsn('<table_name>',<lsn>,<shard_id>);-- Specify an LSN of the BIGINT type.

The following statement shows an example.

SELECT * FROM hg_get_binlog_cursor_by_lsn('test_message_src',152,0);

The following figure shows the query result.

image

Query the LSN of a binary log based on the timestamp of the binary log

You can use the hg_get_binlog_cursor_by_timestamp function to query the LSN of a binary log. This function returns the LSN of the first binary log whose timestamp value is greater than or equal to the value specified in the query. If the timestamp is the latest, this function returns the current timestamp as the value of the hg_binlog_timestamp_us field and the LSN to be assigned to the next data record as the value of the hg_binlog_lsn field. Syntax:

Note

If the timestamp specified in the query is later than the current time, the error message "get binlog cursor in future time" is returned. You can use the now() function to query the current time.

SELECT * FROM hg_get_binlog_cursor_by_timestamp('<table_name>',<timestamp>,<shard_id>);

Sample statement:

SELECT *,to_timestamp(hg_binlog_timestamp_us/1000000.0) FROM hg_get_binlog_cursor_by_timestamp('test_message_src','2024-05-20 19:34:53.791+08',0);

The following figure shows the query result.

image

Consume Hologres binary logs in real time

You can use Flink, Blink, JDBC, or Holo Client to consume Hologres binary logs. For more information, see the following topics:

Check the tables for which binary logging is enabled

You can execute the following SQL statement to check the tables for which binary logging is enabled:

SELECT
    *
FROM
    hologres.hg_table_properties
WHERE
    property_key = 'binlog.level'
    AND property_value = 'replica';

The following figure shows the returned result.开启了binlog的表

Query the storage space of binary logs

  • Hologres provides the pg_relation_size function for you to query the storage size of a table for which binary logging is enabled. The size includes the storage size of binary logs. For more information, see Query the storage size of a table in the "Query the storage sizes of tables and databases" topic.

  • Hologres V2.1 and later provide the hologres.hg_relation_size function for you to query storage details of a table, including the storage size of data and the storage size of binary logs. For more information, see Query the storage details of a table in the "Query the storage sizes of tables and databases" topic.

Disable binary logging for table-related DML statements

You can disable binary logging for data manipulation language (DML) statements of a table by using a Grand Unified Configuration (GUC) parameter. You must add the following SET command before the DML statement and commit them at the session level.

-- Run the following SET command at the session level:
SET hg_experimental_generate_binlog=off;