This topic describes how Shanghai Zhenhui Information Technology Co., Ltd. (referred to as "HELIOS") used general Enterprise SSDs (ESSDs) to eliminate the performance bottleneck of slow database queries during peak hours. In this case, HELIOS purchased an ApsaraDB RDS for MySQL instance, used the general ESSD storage type, and enabled the I/O burst feature. This helps the software as a service (SaaS) systems of HELIOS perform in a more stable and efficient manner. This topic helps you obtain an in-depth knowledge of the solution. You can develop effective countermeasures in similar scenarios based on the successful use case of HELIOS.
About HELIOS
HELIOS was founded in August 2016. HELIOS provides enterprises with SaaS systems and solutions for financial expense control, electronic archives, and corporate spending. Based on deep insights into customer requirements, HELIOS continuously upgrades its products and services and has gradually built a product line and service network that covers China, Japan, and the global market. HELIOS has received investments from institutions such as Blue Lake Capital, China Renaissance, SB China Capital (SBCVC), Z Capital, and Unicorn Capital and has become a highly regarded emerging enterprise. HELIOS is a global innovative enterprise that provides various products, such as HELIOS, HELIOS Selected, e-FILING, and Spendia.
Since its establishment, HELIOS has consistently ranked among the top in the industry in terms of the number of key accounts and partners, and has been committed to developing a powerful SaaS and end-to-end solution that covers the entire lifecycle of user expense management. The solution covers the process of application, consumption, reimbursement, accounting, posting, and filing. HELIOS has obtained certificates and professional qualifications, such as the national high-tech enterprise certificate, System and Organization Controls (SOC) 1, SOC 2, Multi-Level Protection Scheme (MLPS) level 3, ISO 27001, and ISO 27017. HELIOS has also deployed multiple security products that comply with international standards to fully ensure the security of data transmission and storage.
HELIOS has deepened its cooperation with the Alibaba Cloud ApsaraDB team and jointly promoted the continuous optimization and update of key SaaS systems, such as the financial expense control systems and employee business travel systems of enterprises. In cooperation with Alibaba Cloud, HELIOS adopts innovative digital and intelligent methods to drive rapid iteration of its SaaS products. This provides support for the digital transformation of enterprises.
Business challenges: Database queries surge during peak hours.
HELIOS focuses on delivering superior SaaS systems to enterprises. The systems include financial expense management, corporate spending management, and business management systems. HELIOS follows the business model of large-scale replication and standardization that is commonly adopted by the SaaS industry and has high requirements for efficient service expansion, consistency maintenance, and scalability to meet the needs of enterprises.
Typical SaaS architecture
As the business volumes significantly increase, HELIOS faces the following challenges caused by large-scale replication, multi-tenant management, and traffic peaks:
Performance bottlenecks: Customers experience fluctuations in data access. The R&D team observed that system queries slow down during peak hours. HELIOS requires a solution to ensure that database performance does not degrade when data access increases. For example, in the core SaaS service, the read and write load of an ApsaraDB RDS for MySQL instance is less than 350 MB/s during off-peak hours. However, the load significantly increases and exceeds 350 MB/s during peak hours (approximately 3 to 4 hours).
Scalability requirements: Customers have different requirements for database architectures. HELIOS requires a database architecture that can meet different business requirements, maintain the scalability of storage and computing resources, and help customers handle traffic surges during peaks hours to balance the cost and service availability.
Complex O&M management: HELIOS serves a large number of customers. As a result, a large number of database instances need to be managed, which increases management costs. With continuous business upgrade and development, more customers require data migration, isolation, and splitting. As a result, HELIOS requires custom solutions for specific key accounts. HELIOS urgently needs to develop a solution to achieve efficient management, reduce management costs, meet tailored development requirements, and ensure zero-downtime migration to simplify management and O&M.
Solution: Use the general ESSD storage type of ApsaraDB RDS
Original storage type
HELIOS uses the PL1 ESSD storage type in core SaaS services. For more information, see Storage types. An ESSD is an ultra-high performance disk that is designed by Alibaba Cloud based on the next-generation distributed block storage architecture. ESSDs use 25 Gigabit Ethernet and RDMA technology to deliver high random read/write IOPS per disk and shorter one-way latency than standard SSDs. ESSDs are provided in four PLs: PL3, PL2, PL1, PL0. The PLs are sorted by performance in descending order. The following table describes the performance of PL1 ESSDs.
PL | Description | Capacity range (GiB) | Maximum IOPS per ESSD | Maximum throughput per ESSD (MB/s) |
PL1 | Moderate maximum concurrent I/O performance and low I/O latency | 20 to 65,536 | 50,000 | 350 |
The R&D team of HELIOS monitored the database I/O throughput and found that the traffic of core SaaS services shows clear peaks and valleys. During off-peak hours, the business traffic is relatively stable, and the read/write throughput of an RDS instance is less than 350 MB/s, the upper limit of a PL1 ESSD. During peak hours (approximately 3 to 4 hours), the traffic significantly increases to the peak. As a result, the read/write throughput of the RDS instance reaches or exceeds the maximum capacity of the PL1 ESSD and system queries slow down. The following figures show the throughput and I/O traffic of the RDS instance during peak hours and show that the I/O performance of the RDS instance is affected by the upper limit, which is 350 MB/s.
Monitoring of I/O throughput during peak hours (PL1 ESSD)
Upgrade to the general ESSD storage type
The Alibaba Cloud team and the R&D team of HELIOS analyzed elements that affect the SaaS services by scenario and traffic and reached the following conclusions:
The traffic of the core SaaS services shows clear peaks and valleys instead of continuous growth. The services encounter I/O bursts.
The I/O performance of a PL1 ESSD is closely related to the storage capacity. As a result, the upper limits of IOPS and bandwidth are subject to the storage capacity. The original solution can expand the storage capacity to respond to queries during peak hours without the need to change the storage type. However, resources are wasted during off-peak hours.
To respond to the challenges, Alibaba Cloud and HELIOS developed a cost-effective solution together. The solution can be used to smoothly upgrade the storage type from PL1 ESSD to general ESSD without the need to change the business architecture. The solution can also be used to enable the I/O burst feature to accelerate queries during peak hours. This ensures data integrity and business continuity.
General ESSD and I/O bursts
The general ESSD storage type is a new type developed by ApsaraDB RDS to improve performance and scalability and reduce costs. The general ESSD storage type adopts a three-layer storage architecture based on the deep integration of PaaS and IaaS: cache layer (high performance disk), data layer (ESSD) and cold storage layer (OSS). The architecture can store hot data in the cache layer to increase I/O rates and store warm data in ESSDs and cold data in OSS buckets. ESSDs and OSS buckets are cost-effective.
General ESSDs use high performance disks as scalable resources to increase I/O rates. General ESSDs use AliSQL to respond to I/O bursts during database reads and writes. Data that is infrequently accessed is archived to the OSS buckets to reduce costs. General ESSDs decouple I/O performance from storage capacity to provide users with flexible I/O performance and storage capacity.
The I/O burst feature of general ESSDs meets the requirements of HELIOS SaaS services that have significant increase in business traffic and traffic peaks and valleys. After the feature is enabled, the IOPS of an ESSD can exceed the upper limit. This provides I/O expansion capabilities during peak hours to meet unexpected business requirements. This way, the SaaS services of HELIOS can achieve more stable and efficient performance.
Whether to enable the I/O burst feature | Maximum IOPS | Maximum throughput (unit: MB/s) |
No | min{50000, Maximum IOPS for the instance type, IOPS that corresponds to the maximum I/O bandwidth for the instance type, 1800 + 50 × Storage capacity} | min{350, Maximum I/O bandwidth for the instance type, 120 + 0.5 × Storage capacity} |
Yes | min{1000000, Maximum IOPS for the instance type, IOPS that corresponds to the maximum I/O bandwidth for the instance type} | min{4000, Maximum I/O bandwidth for the instance type} |
In daily use, HELIOS can adjust the maximum I/O performance of cloud disks based on business requirements. If high I/O occurs, the system automatically enables the I/O burst mode and increases the upper limit of I/O performance to handle the heavy load. After the load is reduced, the system decreases the I/O performance back to the normal level. This mechanism allows the I/O performance to be scaled to meet the business requirements, prevent resource wastes, and reduce costs.
The following figure shows the effects of the I/O burst feature. After the feature is enabled, the IOPS usage of an RDS instance can exceed 100% during the high I/O period.
Effects of the storage type upgrade
After HELIOS upgrades the cloud disks of the databases of SaaS services to general ESSDs and enables the I/O burst feature, the performance is significantly improved. General ESSDs can respond to I/O bursts and increase throughput up to 4,000 MB/s based on business requirements. This ensures that the I/O throughput of the RDS instance is not subject to the ESSD performance and prevents slow queries caused by insufficient I/O resources. HELIOS can handle traffic surges during peak hours in a more stable manner.
Monitoring of I/O throughput during peak hours (general ESSDs)
Benefits and plans
General ESSDs help HELIOS significantly improve scalability and performance and reduce the costs of database usage.
Performance improvement: The I/O burst feature helps HELIOS improve database performance during peak hours. The feature ensures that databases run in an efficient manner without performance degradation and greatly improves business continuity and user experience.
Cost-effectiveness: The pricing of general ESSDs is the same as the pricing of PL1 ESSDs. The I/O burst feature uses the pay-as-you-go billing method. However, the free quota on burstable I/O operations provided by Alibaba Cloud is 500,000 per hour and is significantly larger than the maximum number of I/O operations of HELIOS in an hour. Therefore, HELIOS can enjoy higher I/O performance without additional costs.
Scalability: The I/O burst feature of general ESSDs can be used to handle traffic surges. The feature allows the system to automatically adjust the upper limit of I/O performance, which increases resource usage.
In the future, HELIOS will continue to use the new features of Alibaba Cloud ApsaraDB to optimize data management systems. With the advancement of technologies, HELIOS will continue to improve performance, reduce costs, and enhance system extensibility. The emergence of general ESSDs enables users to meet the requirements for low costs, low latency, and high durability at the same time. The Alibaba Cloud ApsaraDB team will continuously improve technical capabilities to share technical benefits and provide users with more cost-effective, stable, and reliable services.
Customer remarks
"Alibaba Cloud has not only won our trust with its stable environment, first-grade services, and excellent technical capabilities, but also provided effective support when we faced challenges. Alibaba Cloud continuously focused on handling database bottlenecks and disk switching, which ensured the smooth running of databases. I hope that Alibaba Cloud can continue to provide support for us in the future."
Ma Yunfei
Technical Director of Shanghai Zhenhui Information Technology Co., Ltd.