Processes and methods of performance testing analysis and tuning - Performance Testing

Test analysis and tuning are important steps in the Performance Testing (PTS). In this way, expected performance of a system can be ensured. This topic describes the processes and methods of performance test analysis and tuning to help developers, test engineers, and O&M engineers quickly perform performance test, identify bottlenecks, and conduct tuning. The performance of a system is determined by many factors. This topic does not describe all factors, but can provide a guide for analyzing the system performance.

Applicable object and scope

This topic is applicable to work that requires test analysis and tuning. The intended readers include personnel who ensure the technical quality of a system, such as test administrators, test operators, technical support engineers, project quality administrators, and project administrators.

Performance analysis

Prerequisites

To perform performance analysis, you must conduct various performance testing monitoring tasks, such as the client monitoring of PTS, basic monitoring (Cloud Monitor), and Application Real-Time Monitoring Service (ARMS) monitoring. In addition, you must possess relevant technical knowledge, including knowledge concerning operating systems, middlewares, databases, and development.

Process

The performance analysis is a systematic test process that can identify and solve performance bottlenecks of a system. The following figure shows the process of the performance analysis.

In many cases, the stress testing traffic is not completely routed to the backend (server side). In a network access layer that uses the cloud-based architecture, such as Server Load Balancer (SLB) instances, Web Application Firewall (WAF) instances, Anti-DDoS IP addresses, Content Delivery Network (CDN) points of presence (POPs), or Edge Security Acceleration (ESA) POPs, protection policies may be triggered due to limits on various specifications such as the bandwidth, maximum connections, and new connections or stress testing characteristics that are similar to Challenge Collapsar (CC) and DDoS attacks. This causes unexpected test results. For more information, see Why do errors or timeouts occur even under the small backend load?
You must check whether key metrics meet requirements. If the key metrics do not meet the requirements, you must locate the issue source. In most cases, the server side is likely to be the issue source, and the client side is very unlikely to be the issue source.
For issues that occur on the server side, you must view hardware metrics, such as the CPU, memory, disk I/O, and network I/O. If a hardware metric is abnormal, you must perform in-depth analysis.
If the hardware metrics are normal, you must view middleware metrics, such as thread pools, connection pools, and garbage collection operations (GCs).
If the middleware metrics are normal, you must view database metrics, such as the slow SQL queries, hit rate, locks, and parameter settings.
If the preceding metrics are normal, the algorithm, buffer, cache, synchronous I/O, or asynchronous I/O of your application may be abnormal. In this case, you must perform in-depth analysis.

Possible bottlenecks

Identifying possible bottlenecks of a system is a key step for test analysis and turning. The performance bottlenecks such as insufficient resources or inefficient methods limit overall performance of a system. The following lists common performance bottlenecks of systems.

Bottlenecks in hardware and specifications
In most cases, bottlenecks in hardware and specifications involve CPU, memory, and disk I/O issues.
Performance bottlenecks in middlewares
In most cases, performance bottlenecks in middlewares involve database systems and application software, such as application servers and Web servers. For example, the improper parameter settings of a Java Database Connectivity (JDBC) connection pool configured on the Weblogic platform can cause performance bottlenecks.
Performance bottlenecks in applications
In most cases, performance bottlenecks in applications involve applications that are developed by developers. For example, the following factors cause performance bottlenecks when a large number of users access system applications: improper Java virtual machine (JVM) parameter and container settings, slow SQL queries that can be located by using Alibaba Cloud application performance monitoring (APM) services, such as ARMS, unreasonable database design and program architecture planning, and problematic program design, such as serial processing, lack of buffering and caching, insufficient request processing threads, and uncoordinated producers and consumers.
Performance bottlenecks in operating systems
In most cases, performance bottlenecks in operating systems involve the Windows, UNIX, and Linux operating systems. For example, when the physical memory is insufficient during the performance testing, and the virtual memory settings are improper, the exchange efficiency of the virtual memory is greatly reduced, which causes a significant increase in the behavior response time. In this case, it can be considered that performance bottlenecks occur on the operating system used during the testing.
Performance bottlenecks in network devices
In most cases, performance bottlenecks in network devices involve firewalls, dynamic load balancers, and switches. Network access services that are often used by cloud-based service architectures include SLB instances, WAF instances, Anti-DDoS IP addresses, CDN POPs, and ESA POPs. For example, a dynamic load distribution mechanism is configured for a dynamic load balancer. If hardware resources on an application server have reached their maximum capacity, the dynamic load balancer sends subsequent transaction requests to other application servers that have lower loads. During the testing, it is found that the dynamic load balancer does not play its role. In this case, it can be considered that network bottlenecks occur.

Analysis methods

The analysis for possible bottlenecks aims to identify performance bottlenecks and find out limits that affect the performance of a system. The following contents describe some common analysis methods for bottlenecks.

CPU
If the CPU utilization is extremely high, you must check the CPU consumption status. The CPU consumption status can be User, Sys, or Wait.
- If the CPU User value is extremely high, you must find the process that consumes a high number of CPU resources. When you use a Linux application, you can run the top command to view the CPU consumption and then run the top -H -p <pid> command to identify CPU-intensive threads. When you use a Java application, you can utilize the jstack tool to view the stack that the CPU-intensive threads are executing and the method that consumes the CPU resources. You can view the source code to identify the reason for the high resource consumption. When you use a C++ application, you can utilize the gprof performance tool to perform analysis.
- If the CPU Sys value is extremely high, you can use the strace tool to view the called resource consumption and time of a system in a Linux application.
- If the CPU Wait value is extremely high, you can reduce the number of logs to be written, implement the asynchronous I/O, or change hard disks with higher IOPS performance. An extremely high CPU Wait value may be caused by high-frequency disk read/write operations.
Memory
In most cases, an operating system provides a large cache to maximize the memory utilization. Therefore, it is normal for the memory utilization to reach 99%. To locate memory issues, you must check whether a process occupies an extremely large amount of memory or produces a high number of swap activities.
Disk I/O
One of the most significant metrics of disk I/O is the busy percentage, which can be lowered by reducing the number of logs to be written, implementing the asynchronous I/O, or using hard disks with higher IOPS performance.
Network I/O
Network I/O mainly depends on the size of the transmitted content, which cannot exceed 70% of the maximum network transmission of the hardware. You can reduce the content size by compressing the content, configure the cache on your compute, or transmit the content in batch to improve the network I/O performance.
Kernel parameters
In most cases, kernel parameters keep their default values. The default values do not cause issues for most systems. However, parameter values used in the stress testing may exceed the default values and cause system issues. You can use the sysctl tool to view and modify the kernel parameters.
JVMs
A JVM mainly analyzes the frequency of GCs or FULL GCs and the garbage collection time, which can be viewed by running the jstat command. For the size of each heap and to address frequent GCs, you can use the jmap tool to dump the memory, and then utilize the HeapAnalyzer tool to identify the high memory utilization and determine whether the memory leak occurs. For simplicity, you can use APM tools, such as Alibaba Cloud ARMS.
Thread pools
If existing threads are insufficient, you can adjust relevant parameters to add threads. If the thread settings of the thread pools are high but still insufficient, you must perform in-depth analysis. The following factors can cause the high thread settings to be insufficient: 1. A thread is blocked and cannot be released in a timely manner. In this case, the thread may be waiting for locks. 2. Used methods take a long time. 3. Databases have a long wait time.
JDBC connection pools
If existing connection pools are insufficient, you can adjust relevant parameters to increase the number of connection pools. However, if the database processing speed is slow, the adjustment does not have much effect. You must view the databases and identify the reason why connections are not released in the code.

SQL

Inefficient SQL execution is also a very important reason for the poor performance. You can view the current execution plan to check why SQL execution is slow. In most cases, the following factors cause the inefficient SQL execution:

Category	Subcategory	Expression or description	Reason
Indexes	No indexes	N/A	A full table scan is executed.
	Indexes unused	`substring(card_no,1,4)='5378'`	A full table scan is executed.
		`amount/30<1000`	A full table scan is executed.
		`convert(char(10),date,112)='19991201'`	A full table scan is executed.
		`where salary<>3000`	A full table scan is executed.
		`name like '%Tom'`	A full table scan is executed.
		`first_name + last_name ='beill cliton'`	A full table scan is executed.
		`id_no in('0','1')`	A full table scan is executed.
		`select id from t where num=@num`	A full table scan is executed even if parameters exist.
	Indexes with low performance	Use nonclustered Indexes to perform an `oder by` operation.	Indexes have low performance.
		`username='Tom' and age>20`	String indexes provide lower query efficiency than integer indexes.
		A table contains columns whose values are NULL.	Indexes have low performance.
		Restrain from the use of `IS NULL` or `IS NOT NULL`.	Indexes have low performance.
Data amount	Total data amount	`select *`	A large data volume is generated in many columns.
	Total data amount	`select id,name`	A table contains millions of rows, which generates a large data volume.
	Nested query	Analyze the original data first and then filter the data.	A large useless data volume is generated.
	Associated query	Perform an associated query for multiple tables. In the query, filter a small part of the data in the tables, and then filter most of the data.	A high number of associated operations are performed.
	A large data volume is inserted.	Data is inserted at a time.	A high number of logs are generated, which consumes a large number of resources.
Locks	Lock waiting	`update account set banlance=100 where id=10`	Row-level locks are generated, which causes the entire table to be locked.
	Deadlocks	`A:update a;update b;B:update b;update a;`	Deadlocks are generated.
	Cursors	`Cursor Open cursor,fetch;close cursor`	The performance is low.
	Temporary tables	Execute the `create tmp table` statement to create a temporary table.	A high number of logs are generated.
	`DROP TABLE`	Execute the `DROP TABLE` statement to delete a temporary table.	The system must indicate that the deletion is successful, which prevents the long-time locking.
Others	`EXISTS` instead of `IN`	`select num from a where num in(select num from b)`	`IN` judges each column in a table, whereas `EXISTS` ends the running query as long as a row of data is returned.
	`EXISTS` instead of `select count(*)`	`EXISTS` judges whether a record exists.	`count(*)` cumulatively calculates the number of columns in a table, whereas `EXISTS` ends the running query as long as a row of data is returned.
	`BETWEEN` instead of `IN`	`ID in(1,2,3)`	`IN` judges each column in a table, whereas `BETWEEN` judges whether a field value falls within a specified value range.
	`LEFT JOIN` instead of `NOT IN`	`select ID from a where ID not in(select b.Mainid from b)`	`NOT IN` judges each column in a table. Therefore, `NOT IN` provides an extremely low query efficiency.
	`UNION ALL` instead of `UNION`	`select ID from a union` `select id from b union`	`UNION` deletes duplicate rows and may sort results on disks, whereas `UNION ALL` merges the results together.
	Binding of variables to common SQL statements	`insert into A(ID) values(1)`	Compiling is required each time an SQL statement is written. For your convenience, you can bind variables to common SQL statements. This way, you can compile the statements only once and reuse them later.

Tuning

Performance tuning is a continuous process that requires uninterrupted monitoring and adjustment with changeable applications and increasing user loads. A regular performance test and analysis can help identify issues at the earliest opportunity and ensure stable system operation under each load condition. The tuning covers many aspects, including code optimization, database optimization, server configuration, and network optimization.

Process

Identify issues
- Application code: In normal cases, the performance issues of many programs are caused by improper code. Therefore, identify modules that have bottlenecks and check the code first.
- Database settings: Improper database settings can cause the entire system to slowly run. Therefore, some large databases require database administrators (DBAs) to make proper parameter adjustments before the databases can be put into production.
- Operating system settings: Improper operating system settings may cause system bottlenecks.
- Hardware settings: The disk I/O and memory size can easily cause bottlenecks.
- Network: Network overload causes network conflicts and network latencies.
Analyze issues
- After you identify an issue, you must clarify whether the issue affects the response time or throughput or produces other consequences.
- Are most users or only a few users experiencing issues? What is the difference between the few users experiencing issues and other users?
- Is the monitoring data of system resources normal? Has the CPU utilization reached the limit? How is the I/O?
- Are the issues concentrated in a specific type of modules?
- Are the issues on the client side or the server side? Are system hardware settings sufficient?
- Does the actual load exceed the system load capabilities? Does the system need to be optimized?
After you perform the preceding analysis and locate other system issues, you can have a deeper understanding of system bottlenecks and then identify real reasons.
Identify adjustment goals and solutions
You can improve the system throughput and shorten the response time to provide better support for development.
Test solutions
Perform benchmark testing on the systems to which you apply performance tuning solutions. Benchmark testing refers to the quantitative and comparable testing of a performance metric of a specific type of the test object after specifying scientific test methods, test tools and test systems.
Analyze tuning results
Analyze whether the system tuning reaches or exceeds expected goals, whether the overall system performance is optimized or the performance of some part of the system is optimized to solve other issues, and whether the tuning work can be finished. Finally, if the expected goals are achieved, the tuning work ends.

Precautions

In the design and development of application systems, you must always take performance into account, normalize PTS, routinely conduct intranet performance testing, and periodically perform business performance testing in the real environment.
The key is to identify clear performance goals. After you identify the goals, you can translate the goals into stress testing scenarios in PTS, specify the required load level, and then select the concurrency, the transactions per second (TPS) mode, and a combination of automatic increment and manual regulation for traffic throttling as appropriate.
You must ensure that tuned programs run as expected.
The system performance largely depends on effective design. Tuning is only an auxiliary measure.
The tuning process is an iterative and gradual process. The results of each tuning are fed back into subsequent code development.
Performance tuning cannot sacrifice the code readability and maintainability.

Other test analysis

Success rate

The success rate is determined based on the values returned by the server side and the configured assertions. If no assertions are configured, requests are considered to fail due to error codes returned by the backend service, server exceptions, or timeouts.

Logs

Logs record the information of each request. If the sampling rate is 10%, information about 10 requests of every 100 requests is recorded. If the sampling rate is 100%, each request is recorded. However, logging increases the load on load generators and results in lower performance and increased costs. The log sampling rate does not affect the server side.

Connection establishment

The connection establishmentrefers to a process to establish a Hypertext Transfer Protocol (HTTP) connection. If a specified connection establishment timeout period is exceeded by a request, the request is considered to time out. The request timeout period starts from the Domain Name System (DNS) query and ends with the receiving of the response content.