Hologres V1.1 and later support the multi-instance high-availability deployment mode in which storage is shared by primary and secondary instances in online production environments that require high availability. The multi-instance high-availability deployment mode supports the isolation of faults and loads for high availability. This topic describes the basic principles of high availability solutions and how to configure primary and secondary instances that share storage.
High-availability deployment with automatic recovery of a single instance
Hologres compute nodes, which are the worker nodes in the following figure, are scheduled similar to containers. The resource manager performs periodic health checks. If a compute node requires more than 1 minute to respond due to out-of-memory (OOM) errors or faults in hardware or software, the resource manager automatically starts a new compute node and migrates shards from the faulty compute node to the new compute node. For example, if Worker Node 3 requires more than 1 minute to respond, the resource manager starts Worker Node 4 to replace Worker Node 3 to implement fast recovery. Data is stored in Apsara Distributed File System and does not need to be migrated from compute nodes. Compute nodes are lightweight and stateless to support fast recovery. By default, the single-instance fast recovery feature is enabled for each Hologres instance. If an exception occurs on a node, the instance can automatically recover without the need for manual O&M. If a query operator attempts to access a node that is in automatic recovery, the query immediately fails. Hologres V1.1 and later use a new recovery mechanism that can recover nodes within about 1 minute, which is 5 to 10 times faster than earlier versions.
Multi-instance high-availability deployment
How it works
In single-instance deployment mode, faults are monitored in real time, and faulty nodes are replaced for recovery. During node recovery, the service may be unavailable. In key business scenarios, a higher level high-availability solution is required to support the isolation of faults and loads. Hologres V1.1 and later use the multi-instance high-availability deployment mode in which storage is shared among the instances. In the multi-instance high-availability deployment mode, the primary instance has full capabilities, including data reads and writes, and configurations of permissions and system parameters. Secondary instances are read-only. All operations are performed on the primary instance. The following figure shows the multi-instance high-availability deployment mode. Primary and secondary instances do not share computing resources, and the loads and faults of the instances are isolated. All instances share the same data, access control configurations, and storage fees.
Memory status is automatically synchronized across instances in real time. The memory status of instances within the same region can be synchronized within milliseconds. If you write data to the primary instance, the system automatically synchronizes data from the primary instance to a secondary instance. Therefore, a small amount of CPU and memory resources of the secondary instance are consumed even if the secondary instance is not used. The consumed resource amount is about 1/8
of the consumed resources of the primary instance. We recommend that the specification configurations of secondary instances do not significantly differ from the specification configurations of the primary instance.
Usage notes
You can configure up to 10 read-only secondary instances for each primary instance. The resource configurations among the instances can be slightly different. The shard count must be the same for all instances.
Each read-only secondary instance has a unique endpoint. Different read-only secondary instances are used in different business scenarios. You can use endpoints to isolate business scenarios.
In Hologres V1.3.27 and later, the latency threshold of data synchronization from the primary instance to a secondary instance is changed from
20
minutes to60
minutes. If the synchronization latency exceeds 60 minutes and the resource utilization of the secondary instance remains at100%
for an extended period of time, the secondary instance automatically restarts to reduce the synchronization latency. If the resource utilization of the secondary instance remains at100%
for an extended period of time, we recommend that you optimize query statements that are executed on the secondary instance or scale out the secondary instance.When you associate a read-only secondary instance with a primary instance, you can use the primary instance as expected and the primary instance is not affected.
About 3 minutes to 5 minutes is required to associate a read-only secondary instance with a primary instance. After the association is complete, you can use the read-only secondary instance as expected.
You cannot access read-only secondary instances that are not associated with a primary instance.
If you use MaxCompute to directly read data from the Hologres storage layer that uses the primary-secondary instance architecture, you can use only the URL of the primary Hologres instance to connect to Hologres. You cannot use the URL of a secondary instance. For more information, see Enable the direct read feature for Hologres external tables in the "Hologres external tables" topic.
Suggestions in different scenarios
Common scenarios:
We recommend that you use a primary instance to write and process data and use read-only secondary instances to analyze the data. This ensures read/write splitting.
Other scenarios:
For online service queries, we recommend that you use a single read-only secondary instance to ensure high availability of online services. This meets the high requirements for a stable P99 latency of queries.
For online analytical processing (OLAP) queries, we recommend that you specify a secondary instance for data analysis. The specified secondary instance is different from the secondary instance used for the preceding online service queries. This ensures read splitting for OLAP queries and online service queries. If a large amount of data is queried, online service queries are not affected.
Configure primary and secondary instances that share storage
When you configure the multi-instance high-availability deployment mode, take note of the following limits:
You must use only Hologres instances whose versions are V1.1 or later as primary instances. If the version of the Hologres instance is earlier than V1.1, manually upgrade the instance in the Hologres console or join a Hologres DingTalk group to apply for an instance upgrade. For more information about how to manually upgrade a Hologres instance, see Instance upgrades. For more information about how to join a Hologres DingTalk group, see Obtain online support for Hologres.
You cannot access read-only secondary instances that are not associated with a primary instance.
A primary instance and its read-only secondary instances must be of the same version.
A primary instance and its read-only secondary instances must reside in the same region.
Permissions required for associating or disassociating read-only secondary instances
If you want to associate read-only secondary instances with or disassociate the instances from a primary instance by using a RAM user, you must attach the AliyunHologresFullAccess policy to the RAM user. For more information about the permissions that you can grant to RAM users, see Grant permissions to a RAM user.
Perform the following operations to configure the multi-instance high-availability deployment mode:
Purchase a Hologres instance.
ImportantThe read-only secondary instance that you want to purchase must be in the same region as the primary instance with which you want to associate the secondary instance.
When you purchase a read-only secondary instance, set the Specifications parameter to Read-only Secondary Instance and the Primary Instance ID of the Read-only Secondary Instance parameter to the ID of the primary instance with which you want to associate the read-only secondary instance in the zone. For more information about other parameters, see Purchase a Hologres instance.
Associate the read-only secondary instance with a primary instance.
After you purchase a read-only secondary instance, the instance is associated with the primary instance that you selected on the buy page. You can use the read-only secondary instance after the instance enters the Running state.
Use the instances.
When you use the multi-instance high-availability deployment mode, take note of the following items:
You can use the endpoint of the read-only secondary instance to provide online services.
You must perform all operations such as creating tables and granting permissions to users on the primary instance. You can use the read-only secondary instance to only read data.
The read-only secondary instance automatically inherits all objects of the primary instance. The objects include users and tables. You cannot separately create users for the read-only secondary instance.