Performance testing does not end when the test finishes running. The real value comes from analyzing results, pinpointing bottlenecks, and tuning the system to meet your performance goals. The performance of a system is determined by many factors. This topic does not describe all factors, but provides a guide for analyzing system performance.
The following sections walk through performance analysis and tuning for systems running on Alibaba Cloud. The intended audience includes developers, test administrators, test operators, technical support engineers, project quality administrators, project administrators, and O&M engineers responsible for system performance.
Performance analysis
Prerequisites
Before you start, make sure you have:
Monitoring configured: Client-side monitoring in Performance Testing (PTS), infrastructure monitoring through Cloud Monitor, and application-level tracing through Application Real-Time Monitoring Service (ARMS).
Technical background: Working knowledge of operating systems, middleware, databases, and application development.
Analysis workflow
Work through the following layers top-down. Most performance issues originate at the network or server layer, so start there before looking at the client side.
Step 1: Check the network access layer
In cloud-based architectures, stress testing traffic may not fully reach the backend. Protection policies on Server Load Balancer (SLB), Web Application Firewall (WAF), Anti-DDoS IP addresses, Content Delivery Network (CDN) points of presence (POPs), or Edge Security Acceleration (ESA) POPs can intercept traffic when:
Bandwidth, maximum connection, or new connection limits are exceeded.
Traffic patterns resemble Challenge Collapsar (CC) or DDoS attacks.
If you see unexpected errors or timeouts despite low backend load, check these services first. For details, see Why do errors or timeouts occur even under the small backend load?
Step 2: Validate metrics
Check whether response time, throughput, and error rate meet your targets. If any metric falls outside the acceptable range, the issue almost always originates on the server side rather than the client side.
Step 3: Inspect hardware metrics
On the server, check CPU utilization, memory usage, disk I/O, and network I/O. If any of these are abnormal, perform a deeper investigation on the affected resource (see Analysis methods below).
Step 4: Inspect middleware metrics
If hardware metrics are normal, check middleware-level indicators: thread pool utilization, connection pool usage, and garbage collection (GC) frequency and duration.
Step 5: Inspect database metrics
If middleware metrics are normal, investigate database performance: slow SQL queries, cache hit rates, lock contention, and database parameter settings.
Step 6: Inspect application logic
If all infrastructure metrics are normal, the bottleneck is likely in application code. Investigate algorithms, buffering and caching strategies, and synchronous vs. asynchronous I/O patterns.
Common bottleneck categories
Hardware and specifications
CPU, memory, and disk I/O are the most common hardware bottlenecks. Undersized instances or specification limits can cap throughput before any software-level issue becomes visible.
Middleware
Database systems and application servers (including web servers) often become bottlenecks due to misconfiguration. For example, improper Java Database Connectivity (JDBC) connection pool settings on a Weblogic platform can throttle concurrent request handling.
Application code
Common application-level bottlenecks include:
Suboptimal Java Virtual Machine (JVM) parameters or container settings
Slow SQL queries (identifiable through APM services such as ARMS)
Poorly designed database schemas or application architecture
Serial processing where parallel processing is possible
Missing buffer or cache layers
Insufficient request processing threads
Uncoordinated producer-consumer patterns
Operating system
On Windows, UNIX, or Linux systems, OS-level misconfigurations can degrade performance. For example, when physical memory is insufficient and virtual memory settings are not properly configured, excessive swap activity significantly increases response times.
Network devices
Firewalls, load balancers, switches, and cloud network services (SLB, WAF, Anti-DDoS IP addresses, CDN POPs, and ESA POPs) can introduce bottlenecks. For example, if a load balancer fails to distribute traffic across servers when one server reaches capacity, the load balancer configuration is the bottleneck.
Analysis methods
CPU
CPU utilization breaks down into three categories, each pointing to a different root cause:
High CPU User
An application-level process is consuming excessive CPU.
Run
topto identify the process with highest CPU usage.Run
top -H -p <pid>to narrow down to the specific thread.For Java applications, use
jstackto capture the thread stack trace and identify the CPU-intensive method.For C++ applications, use
gprofto profile execution.Review the source code at the identified location.
High CPU Sys
The kernel is consuming excessive CPU, typically due to expensive system calls.
Use
straceto trace system calls and identify which calls consume the most time.
High CPU Wait
The CPU is idle while waiting for I/O to complete. This is typically caused by heavy disk read/write activity.
Reduce log output volume.
Switch to asynchronous I/O.
Upgrade to disks with higher IOPS performance.
Memory
Operating systems use spare memory for disk caching, so memory utilization near 99% is normal. Instead, watch for:
A single process consuming a disproportionately large amount of memory.
High swap activity, which indicates the system is running out of physical memory.
Disk I/O
The most important disk I/O metric is the busy percentage. To reduce it:
Decrease log write volume.
Use asynchronous I/O.
Upgrade to disks with higher IOPS.
Network I/O
Network throughput depends on payload size. Keep utilization below 70% of the hardware's maximum capacity. To improve network I/O:
Compress response payloads.
Enable caching on compute nodes.
Batch smaller transmissions into fewer, larger transfers.
Kernel parameters
Default kernel parameter values work for most workloads, but stress testing can exceed these defaults. Use sysctl to view and modify kernel parameters as needed.
JVM
Monitor GC and full GC frequency and duration:
Run
jstatto check GC statistics.If GCs are too frequent, use
jmapto dump heap memory.Analyze the dump with HeapAnalyzer to identify high memory consumption and potential memory leaks.
Alternatively, use an APM tool such as ARMS for a visual, real-time view of JVM metrics.
Thread pools
If thread pools are saturated, increase the pool size. If a larger pool still does not help, investigate deeper:
Threads blocked waiting for locks.
Methods with long execution times.
Database queries with long wait times.
JDBC connection pools
If connection pools are exhausted, increase the pool size. However, if the underlying database is slow, more connections will not help. Check for:
Slow queries that hold connections longer than necessary.
Code paths that fail to release connections back to the pool.
SQL
Inefficient SQL is one of the most common causes of poor performance. Check the execution plan to understand why a query is slow. The following table lists common SQL performance issues.
Index issues
Problem | Example | Impact |
No index | N/A | Full table scan |
Function on indexed column |
| Full table scan |
Expression on indexed column |
| Full table scan |
Type conversion on indexed column |
| Full table scan |
Inequality operator |
| Full table scan |
Leading wildcard |
| Full table scan |
Concatenation in WHERE clause |
| Full table scan |
IN with small value list |
| Full table scan |
Parameterized query |
| Full table scan even with parameters |
Nonclustered index with ORDER BY | N/A | Poor index performance |
String vs. integer index |
| String indexes are slower than integer indexes |
Nullable columns | Columns with NULL values | Poor index performance |
IS NULL / IS NOT NULL | N/A | Poor index performance |
Data volume issues
Problem | Example | Impact |
SELECT * |
| Retrieves all columns unnecessarily |
Large table without filtering |
| Large data volume |
Nested query without early filtering | Filter after full data load | Processes unnecessary data |
Multi-table join without selective predicates | Join then filter | Excessive join operations |
Bulk insert | Insert all data at once | Generates excessive logs, high resource usage |
Lock and concurrency issues
Problem | Example | Impact |
Row-level lock escalation |
| May lock the entire table |
Deadlocks | A: | Mutual blocking |
Cursors |
| Low performance |
Temporary tables (CREATE) |
| Generates excessive logs |
Temporary tables (DROP) |
| Must confirm deletion to prevent prolonged locking |
Query optimization tips
Instead of | Use | Why |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hardcoded SQL | Parameterized (bound) SQL | Compile once, reuse the execution plan |
Tuning
Performance tuning is iterative. As applications evolve and user loads grow, regular testing and tuning keeps the system within acceptable performance boundaries.
Tuning workflow
1. Identify the issue
Narrow down the problem area:
Application code: Code-level issues are the most common source of performance problems. Check here first.
Database settings: Misconfigured databases can slow the entire system. For large databases, have a database administrator (DBA) review parameter settings before going to production.
Operating system settings: Misconfigured OS parameters can introduce system-level bottlenecks.
Hardware: Disk I/O and memory are the most common hardware constraints.
Network: Overloaded networks cause packet loss and latency spikes.
2. Analyze the issue
Once you identify the problem area, determine its scope:
Does the issue affect response time, throughput, or both?
Are all users affected, or only a subset? What distinguishes the affected users?
Are system resource metrics (CPU, memory, I/O) at or near their limits?
Is the issue concentrated in specific modules or endpoints?
Is the issue on the client side or the server side?
Does the actual load exceed the system's designed capacity?
3. Define goals and solutions
Improve the system throughput and shorten the response time to better support your workloads. Translate these goals into PTS stress testing scenarios with specific load levels, then select the appropriate mode: concurrency-based, transactions per second (TPS)-based, or a combination of automatic increment and manual regulation for traffic throttling.
4. Test the solution
Run benchmark tests after each change. Benchmark testing provides quantitative, comparable measurements of specific performance metrics, giving you an objective way to evaluate whether the change helped.
5. Evaluate results
After each tuning iteration, evaluate:
Did the change meet or exceed the performance goal?
Did it improve overall system performance, or only a specific component?
Are further tuning iterations needed?
If the goals are met, the tuning cycle is complete.
Best practices
Design for performance from the start. Tuning compensates for design gaps but cannot replace good architecture. Factor performance requirements into design and development early.
Define clear performance goals. Translate goals into PTS test scenarios with specific load levels, then select the appropriate mode: concurrency-based, TPS-based, or a combination of automatic increment and manual regulation for traffic throttling.
Validate after every change. Run regression tests after each tuning iteration to confirm the change works as expected and does not introduce regressions.
Integrate performance testing into your workflow. Run intranet performance tests regularly during development. Conduct business performance tests periodically in the production environment.
Keep tuning iterative. Feed results from each cycle back into later development. Performance work is never one-and-done.
Protect code quality. Do not sacrifice readability or maintainability for performance. Optimizations that make code harder to maintain create long-term costs that outweigh short-term gains.
Other test analysis
Success rate
Success rate is determined by server return values and any assertions you configure. Without assertions, a request is marked as failed when the backend returns an error code, the server throws an exception, or the request times out.
Logs
PTS logs record details for each sampled request. At a 10% sampling rate, PTS records 10 out of every 100 requests. At 100%, every request is recorded.
Trade-off: Higher sampling rates give you more diagnostic detail but increase the load on load generators, reducing their performance and raising costs. The sampling rate does not affect the server under test.
Connection establishment
Connection establishment is the process of setting up an HTTP connection between the load generator and the server. The request timeout covers the entire span from DNS resolution to response completion. If this duration exceeds the configured timeout threshold, the request is marked as timed out.