ApsaraDB for HBase allows you to select different specifications, quantities, and disk types for master nodes and core nodes. You can select specifications and disk types based on your workload characteristics, such as the queries per second (QPS), required storage space, the number of read and write requests, response latency, and stability.
Usually, you need to select:
Specifications of master nodes.
Specifications and number of core nodes.
Size and type of disks.
ApsaraDB for HBase edition. For more information about how to select an ApsaraDB for HBase edition, see ApsaraDB for HBase editions.
Edition of your Elastic Compute Service (ECS) instance: A dedicated ECS instance is used. Dedicated resources are allocated to the dedicated ECS instance to ensure stability. If your workloads require a low response latency, use a dedicated ECS instance and SSDs.
Selection of master node specifications
Master nodes cannot be used for storage. By default, two master nodes in primary-secondary mode are used for disaster recovery if a single point of failure occurs. Master nodes are important. ApsaraDB for HBase masters, Hadoop Distributed File System (HDFS) namenodes, and ZooKeeper are deployed on master nodes. If the master nodes do not have sufficient CPU or memory resources, the performance of your cluster is severely degraded.
Number of core nodes | Selection of master node specifications |
< 4 | 4-core CPU and 8 GB memory |
4 ≤ Number of core nodes < 8 | 8-core CPU and 16 GB memory (recommended for small clusters) |
8 ≤ Number of core nodes < 16 | 8-core CPU and 32 GB memory |
> 16 | 16-core CPU and 64 GB memory or higher |
You must select the specifications of master nodes based on the numbers of core nodes, tables, and regions managed by the cluster. If the cluster manages a large number of tables or regions, select the master nodes of high specifications.
Selection of core node specifications
Core nodes refer to RegionServers in ApsaraDB for HBase. You must select core node specifications based on the number of requests. The minimum specifications of a core node are 4-core CPU and 8 GB memory and the maximum specifications of a core node are 32-core CPU and 128 GB memory.
To achieve optimal performance, it is crucial to cache all metadata, which is a critical factor in ensuring high performance. For different ApsaraDB for HBase clusters, the following examples for the selection of core node specifications are provided for reference:
Small clusters: We recommend that you use 4-core CPU and 16 GB memory or 8-core CPU and 32 GB memory.
Medium or large clusters: Select core nodes based on the amount of data to be stored in the memory.
Large amount of data to be stored: We recommend that you use 16-core CPU and 64 GB memory or 32-core CPU and 128 GB memory.
Small amount of data to be stored: We recommend that you use 16-core CPU and 32 GB memory or 32-core CPU and 64 GB memory.
The number of requests is not the only criterion to determine the selection of core node specifications. We recommend that you take full factors into account when you select the specifications of core nodes. For example, your workload needs to process hundreds of requests per second. In most cases, you can select a core node of 4-core CPU and 8 GB memory to handle the requests. However, this rule does not apply to the following scenarios. In the following scenarios, the core nodes of 4-core CPU and 8 GB memory may affect the stability of your workload and increase the response latency.
A row that stores kilobytes or megabytes of data is queried.
A scan request contains complex filters.
The cache hit rate of requests is low. Each request is received by disks.
The cluster manages a large number of tables and regions.
If you need assistance in calculating the required storage space, join the ApsaraDB for HBase Q&A
DingTalk group s0s3eg3 or submit a ticket.
The following table lists the recommended core node specifications for handling different workloads. We recommend that you take full factors into account when you select the specifications of core nodes.
TPS+QPS | Recommended number and specifications of core nodes | Suggestion |
Less than 1,000 | Two core nodes of 4-core CPU and 16 GB memory | The minimum specifications recommended for handling light loads. We recommend that you do not deploy more than 600 regions on each core node. The minimum specifications available in ApsaraDB for HBase are 4-core CPU and 8 GB memory. We recommend that you do not select 4-core CPU and 8 GB memory because 8 GB memory may cause out of memory errors when the load or key-value store surges. |
1,000 to 20,000 | Two or three core nodes of 8-core CPU and 32 GB memory | In comparison to 8-core CPU and 16 GB memory, 8-core CPU and 32 GB memory is more cost-effective. It offers an additional 16 GB memory to guarantee the stability of your workloads. We recommend that you select 8-core CPU and 32 GB memory to handle light and medium loads. |
More than 20,000 | 8-core CPU and 32 GB memory; 16-core CPU and 32 GB memory; 16-core CPU and 64 GB memory; 32-core CPU and 64 GB memory; 32-core CPU and 128 GB memory; or higher | Select the number of core nodes based on the actual number of requests. If your workload is deployed online, we recommend that you select specifications that have large memory to increase the cache hit rate. If you need to run MapReduce or Spark offline tasks that have heavy loads, or when the transactions per second (TPS) or QPS is high, select specifications that have more CPU resources. |
Select core nodes of high specifications or increase the number of core nodes
You can scale out your ApsaraDB for HBase cluster by adding core nodes when the load spikes, the response latency increases, or the cluster becomes unstable. However, hotspotting may occur if your workload in the cluster is not designed or served in a proper manner. The specifications of a core node determine its capability to prevent hotspotting. If you only scale out your cluster by adding low-specification core nodes, this may affect the service stability when the load spikes. Therefore, we recommend that you select high-specification core nodes. For example, if large requests are directed to the nodes or the traffic spikes in a region, the low-specification nodes may be overloaded or run out of memory. As a result, the stability of your cluster is affected.
We recommend that you select specifications for your core nodes based on the requirements of your workloads.
If the specifications of your master or core nodes fail to meet your requirements, you can upgrade the nodes. For more information, join the ApsaraDB for HBase Q&A
DingTalk group s0s3eg3 or submit a ticket.
Storage types
ApsaraDB for HBase supports three storage types: cloud disks, local disks, and cold storage.
Cloud disks: Cloud disks are scalable and reliable. We recommend that you use cloud disks. They are replicated to ensure redundancy and can be expanded based on your needs. Unlike physical disks, cloud disks are independent of hardware specifications. This prevents data loss caused by physical damages. Cloud disks include SSDs and ultra disks.
Local disks: Local disks are physical disks. Local disks have a lower price than cloud disks. Local disks are unscalable. The specifications of the core nodes that you select determine the size of the local disks. To increase the storage capacity, you can add only more core nodes. You cannot upgrade ECS instances that use local disks to increase the storage capacity. If two or more physical disks are damaged, your workloads are affected. The failure of a single disk does not cause data loss. The ApsaraDB for HBase support team is ready to replace the damaged disk for you at the earliest opportunity. The startup costs for local disks are high. Local disks are suitable for storing large amounts of data.
Cold storage: Cold storage is dedicated to ApsaraDB for HBase. Cold storage is based on Object Storage Service (OSS). You can use cold storage in conjunction with cloud disks. You can use cold storage to store infrequently accessed data or use the hot and cold data separation feature to automatically archive cold data. This reduces data archiving costs.
After you create an ApsaraDB for HBase cluster, you can no longer change the storage type of the cluster. If you select cloud disks, you can increase the storage capacity by expanding the cloud disks or adding more core nodes. If you select local disks, you can increase the storage capacity only by adding more core nodes. Cold storage is an exception. When you create an ApsaraDB for HBase cluster, you do not need to enable cold storage. After an ApsaraDB for HBase cluster is created, you can enable cold storage and expand the storage capacity based on your needs.
Feature | Storage type | Workload type |
High performance | SSDs, local SSDs, or ESSDs | Online workloads that require low response latency, such as advertising, recommendation, feeds, and user profiling. In these scenarios, you can use SSDs or ESSDs to ensure low response latency of 1 to 2 milliseconds and reduce performance jitters. SSDs are suitable for users that require low P99 latency. P99 latency: 99% of the requests must be faster than the given latency. |
High efficiency | Ultra disks or local HDDs | Latency-sensitive online workloads. In most cases, if you use HDDs, the response latency is around 10 milliseconds. However, HDDs cause more performance jitters than SSDs. |
Cold data storage | OSS for cold storage | Near-line storage and data archiving. Cold storage that uses cloud or local disks can achieve almost the same write throughput as hot storage that uses cloud or local disks. However, the QPS of cold data reads is limited. The read latency is around tens of milliseconds. |