This topic describes metrics for performance tests.
Purposes and intended readers
The metrics in this topic can be used as criteria for technical quality evaluation of performance test projects, to standardize technical test result evaluation and unify technical test quality measurement. This topic only some major metrics. More metrics may be used in actual technical quality measurement. Intended readers include test administrator, test operators, technical support personnel, project administrator, and technical quality related personnel.
System performance metrics
Response time
Definitions
The response time refers to the time period from when the client initiates a request to when the client receives a response from the server. In performance tests, the time period from when the load pressure is applied to when the tested server returns a result is considered as the response time. It is generally measured in seconds or milliseconds. Average response time refers to the average value of the same transaction when the system is running stably. In general, the average response time is used for transactions. Depending on different transactions, the average response time can be generally divided into complex transaction response time, simple transaction response time, special transaction response time. When you set a special transaction response time, you must clarify the particularity of the transaction in terms of response time.
Abbreviation
Response Time (RT)
Standards
The acceptable response time varies in different industries. In general, the following values are applicable to online real-time transactions:
Internet companies: less than 500 milliseconds. For example, Taobao has a response time of 10 milliseconds.
Financial companies: less than 1 second is preferred, or less than 3 seconds for complex transactions.
Insurance companies: less than 3 seconds.
Manufacturing companies: less than 5 seconds.
For batch processing transactions:
Time window: the overall duration of a test process. The time window depends on the amount of data. For example, the time window values vary in Double 11 and 99 Promotion. If a large amount of data is involved, the test can be completed within 2 hours.
System processing capacity
Definitions
System processing capacity refers to the ability to process information using the system hardware and software. It is measured by the number of transactions that the system can process per second. A transaction can be a business process from the operator perspective or a business application and response process from the system perspective. The former is called a business transaction process and the latter is called a transaction. Both metrics can evaluate the processing power of the system. We recommend that you use a metric that is consistent with the system transaction log to facilitate transaction statistics. System processing capacity is an important metric in technical test activities.
Abbreviation
In general, the following metrics can be used to measure system processing capacity:
Hits per Second (HPS): the number of clicks per second.
Transactions per Second (TPS): the number of transactions processed by the system per second.
Queries per Second (QPS): the number of queries processed by the system per second. For Internet business, if only one request connection is established in an application, then TPS, QPS, and HPS are equal. In general, TPS measures the entire business process, QPS measures the number of API queries, and HPS indicates the click requests sent to the server.
Standards
Regardless of TPS, QPS, or HPS, a large value indicates a high system processing capacity. In general, the following values are acceptable:
Financial companies: 1000 TPS to 50,000 TPS, excluding Internet-based activities.
Insurance companies: 100 TPS to 100,000 TPS, excluding Internet-based activities.
Manufacturing companies: 10 TPS to 5,000 TPS.
E-commerce companies: 10,000 TPS to 1,000,000 TPS.
Medium-sized Internet websites: 1000 TPS to 50,000 TPS.
Small Internet websites: 500 TPS to 10,000 TPS.
Concurrent users
Definitions
Concurrent users refer to the users who log on to the system and perform operations at a specified point in time. For a system that uses persistent connections, the maximum number of concurrent users represents the concurrency capability of the system. For a system that uses short-lived connections, the maximum number of concurrent users is not equal to the concurrency capability of the system, which also depends on the system architecture and system processing capacity. For example, if the system provides a high throughput and short-lived connections can be reused, the number of concurrent users is often greater than the number of concurrent connections of the system. The TPS mode (or RPS mode) is suitable for most systems that use short-lived connections. PTS supports tests in RPS mode to facilitate throughput test setup and measurement. In tests, virtual users are used to simulate real-world users to perform operations.
Abbreviation
Virtual User (VU)
Standards
In general, performance tests are performed to measure the system processing capacity instead of the number of concurrent users. Except that the long connections of the server may affect the number of concurrent users, the system processing capacity is independent of the number of concurrent users. In testing the system processing capacity, you can use a small number of concurrent users, or a large number of concurrent users.
Failure ratio
Definitions
Failure ratio refers to the probability of failed transactions when load pressure are applied on the system. Failure rate = (Number of failed transactions/Total number of transactions) × 100%. For a stable system, request failures are caused by timeout and therefore the failure ratio is equal to the timeout rate.
Abbreviation
Failure Ratio (FR)
Standards
Different systems have different requirements for failure ratio. In general, less than 6‰ is acceptable. The success rate must be 99.4% or higher.
Resource metrics
CPU
Definitions
The central processing unit is a very large integrated circuit and is the computing core and control unit of a computer. It is mainly used to interpret computer instructions and process data in computer software. CPU load measures the request queue length in the system and is the average system load.
Abbreviation
Central Processing Unit (CPU)
Standards
CPU utilization is mainly used to measure CPU in four modes: user, sys, wait, and idle. CPU utilization threshold is 75%. CPU sys% threshold is 30% and CPU wait% threshold is 5%. Even single-core CPUs must comply with the preceding requirements. The CPU load must be less than the number of CPU cores.
Memory
Definitions
Memory is one of the important components in the computer and is a bridge to communicate with the CPU. All programs in the computer run in the memory, so the memory has a great impact on the performance of the computer.
Abbreviation
N/A
Standards
To maximize memory usage, cache is added to memory in modern operating systems. Therefore, 100% memory utilization does not necessarily mean memory bottleneck. Swap usage is often used to measure memory bottleneck. In general, the swap usage must be lower than 70%. Otherwise, the system performance may be affected.
Disk throughput
Definitions
Disk throughput is the amount of data that passes through a disk per unit of time without a disk failure.
Abbreviation
N/A
Standards
Disk metrics include IOPS, disk busy percentage, number of disk queues, average service time, average wait time, and disk usage. The disk busy percentage directly reflects whether the disk has a bottleneck. In general, the disk busy percentage must be lower than 70%.
Network throughput
Definitions
Network throughput refers to the amount of data that passes over the network per unit time without a network failure. Unit: bytes/s. Network throughput measures the system requirements for the transmission capacity of network devices or links. When the maximum transmission capacity of network devices or links is approached, you must upgrade network devices.
Abbreviation
N/A
Standards
Mbps is a major metric for network throughput. It generally cannot exceed 70% of the maximum transmission capacity of network devices or links.
Kernel parameters
Operating system kernel parameters mainly include semaphores, processes, and file handles. The following table describes these kernel parameters.
Level-1 metric
Level-2 metric
Unit
Description
Kernel parameters
Maxuprc
N/A
The maximum number of processes per user.
Max_thread_proc
N/A
The maximum number of threads available for each process.
Filecache_max
Bytes
The maximum physical memory available for cache file I/O.
Ninode
N/A
The maximum number of inodes in memory available for the HFS.
Nkthread
N/A
The maximum number of threads that are running simultaneously.
Nproc
N/A
The maximum number of processes that are running simultaneously.
Nstrpty
N/A
The maximum number of pseudo terminal slaves (PTSs) based on streams.
Maxdsiz
Bytes
The maximum data size (in bytes) per process.
maxdsiz_64bit
Bytes
The maximum data size (in bytes) per process.
maxfiles_lim
N/A
The maximum number of file descriptors per process
maxssiz_64bit
Bytes
The maximum stack size per process.
Maxtsiz
Bytes
The maximum text size per process.
nflocks
N/A
The maximum number of file locks.
maxtsiz_64bit
Bytes
The maximum text size per process.
msgmni
N/A
The maximum number of system v IPC message queues (IDs).
msgtql
N/A
The maximum number of system v IPC messages.
npty
N/A
The maximum number of BSD pseudo TTYs (PTYs).
nstrtel
N/A
The number of telnet device files supported by the kernel.
nswapdev
N/A
The maximum number of swap devices.
nswapfs
N/A
The maximum number of swap file systems.
semmni
N/A
The number of System V IPC semaphore IDs.
semmns
N/A
The number of System V IPC semaphores.
shmmax
Bytes
The maximum size of system v shared memory.
shmmni
N/A
The number of system v shared memory IDs.
shmseg
N/A
The maximum size of system v shared memory per process.
Middleware metrics
Definitions
Common metrics for middleware services (such as Tomcat and Weblogic) include JVM, ThreadPool, and JDBC. The following table describes these metrics.
Level-1 metric
Level-2 metric
Unit
Description
GC
GC frequency
N/A
The partial garbage collection frequency of a Java virtual machine.
Full GC frequency
N/A
The full garbage collection frequency of a Java virtual machine.
Average full GC duration
Seconds
The average duration for full garbage collection.
Maximum full GC duration
Seconds
The maximum duration for full garbage collection.
Heap usage
%
The usage of heap.
Thread pool
Active thread count
N/A
The number of active threads.
Pending user request
N/A
The number of user requests that are enqueued.
JDBC
JDBC active connection
N/A
The number of JDBC active connections.
Standards
The number of active threads cannot exceed the specified maximum value. In general, when the system performance is good, set the minimum value to 50 and the maximum value to 200.
The number of JDBC active connections cannot exceed the specified maximum value. In general, when the system performance is good, set the minimum value to 50 and the maximum value to 200.
The GC frequency, especially the full GC frequency, cannot be very high. In general, when the system performance is good, set the minimum JVM heap size and maximum JVM heap size to 1024 M.
Database metrics
Definitions
Common metrics for databases (such as MySQL) include SQL, throughput, cache hit ratio, and connections. The following table describes these metrics.
Level-1 metric
Level-2 metric
Unit
Description
SQL
Duration
Microseconds
The duration to execute a SQL statement.
Throughput
QPS
N/A
The number of queries per second.
TPS
N/A
The number of transactions per second.
Hit ratio
Key buffer hit ratio
%
The hit ratio of the index buffer.
InnoDB buffer hit ratio
%
The hit ratio of the InnoDB buffer.
Query cache hit ratio
%
The hit ratio of the query cache.
Table cache hit ratio
%
The hit ratio of the table cache.
Thread cache hit ratio
%
The hit ratio of the thread cache.
Lock
Waits
N/A
The number of lock waits.
Waiting time
Microseconds
The lock waiting time.
Standards
A small SQL duration is preferred. Generally, it is in microseconds.
A high hit ratio is preferred. Generally, it cannot be lower than 95%.
Small values are preferred for the number of lock waits and the lock waiting time.
Frontend metrics
Definitions
Common frontend metrics include the time to display pages and the time to display network. The following table describes these metrics.
Level-1 metric
Level-2 metric
Unit
Description
Page display
First contentful paint
Milliseconds
The time it takes for the first piece of content to appear on a webpage after you enter a URL in the address bar.
OnLoad event time
Milliseconds
The time for the browser to trigger an onLoad event. This event is triggered when the original document and all referenced content are completely downloaded.
Time to fully loaded
Milliseconds
The time to complete all onLoad JavaScript programs and trigger all dynamic or lazy-loaded content by those programs.
Pages
Page size
KB
The size of the entire page.
Requests
N/A
The total number of all network requests when you download resources from a website. A small value is preferred.
Network
DNS time
Milliseconds
The DNS lookup time.
Connection time
Milliseconds
The time to establish a TCP/IP connection between the browser and Web server.
Server time
Milliseconds
The processing time by the server.
Transmission time
Milliseconds
The time to transmit the content.
Waiting time
Milliseconds
The time to wait for a resource to be released.
Standards
A small page size is preferred and compression can be used.
Small page display time values are preferred.
Stability metrics
Definitions
Minimum stable time: The minimum time that the system can run stably under the condition of 80% of the maximum capacity or standard load pressure (expected daily pressure). Generally, for a system running on a working day (8 hours), it must run stably for at least 8 hours. For a 24/7 operating system, it must run stably for at least 24 hours. If the system cannot run stably, performance degradation or even crash may occur as business workloads and running time increase.
Standards
The TPS curve remains flat without large-scale fluctuations.
For the resource metrics, no leaks or exceptions occurs.
Batch processing metrics
Definitions
The amount of data processed per unit of time by the batch processing program. It is generally measured by the amount of data processed per second. Processing efficiency is the most important metric for estimating batch processing time windows. The start time and end time of batch processing time windows of different systems may overlap. A system may have multiple batch processes that are performed simultaneously, and their time windows can overlap. Long-period batch processing tasks affect the performance of online real-time trading.
Standards
If a large amount of data is involved, a short batch processing time window is preferred.
The performance of online real-time trading cannot be affected.
Scalability metrics
Definitions
The ratio between hardware resource increase and performance increase when application programs or operating systems are deployed in cluster mode. The formula: (Performance increase/Original performance)/(Resource increase/Original resources) × 100%. The trends of scalability metrics can be obtained after multiple tests. If an application system has high scalability, its scalability metric values must be linear or near-linear. Large-scale distributed systems often have high scalability.
Standards
In ideal cases, resource increase and performance increase are in a linear relationship.
The performance increase must be 70% or higher.
Reliability metrics
Hot standby deployment
The hot standby deployment is used to ensure high reliability. The following metrics can be used to measure reliability:
Whether the node switchover is successful and the actual time consumed.
Whether the business is interrupted during the node switchover.
Whether the node switchback is successful and the actual time consumed.
Whether the business is interrupted during the node switchback.
How much data is lost during the node switchback. In the hot standby test, you can use the pressure generation tool to apply performance pressure on the system, ensuring that the test results are consistent with actual production conditions.
Cluster architecture
The following metrics can be used to measure the cluster reliability for a system that uses the cluster architecture:
Whether the business is interrupted when a node in the cluster is faulty.
Whether the system needs to be restarted when a new node is added to the cluster.
Whether the system needs to be restarted when the faulty node is recovered and then added to the cluster.
Whether the business is interrupted when the faulty node is recovered and then added to the cluster.
How long does the node switchover take. In the cluster reliability test, you can use the pressure generation tool to apply performance pressure on the system, ensuring that the test results are consistent with actual production conditions.
Backup and restoration
This item verifies whether the system backup and recovery mechanism is effective and reliable, including system backup and recovery, database backup and recovery, and application backup and recovery. The following metrics can be used to measure backup and recovery reliability:
Whether the backup is successful and the actual time consumed.
Whether the backup is automated by using a script.
Whether the restoration is successful and the actual time consumed.
Whether the restoration is automated by using a script.
The selection and verification of metrics depends on the test purpose and test requirements for the system. Different metrics can be used for different systems, different test purposes, or different test requirements.
If some systems require additional front-end user access capabilities, you must add the user access concurrency metrics.
To verify the batch processing performance, use the batch processing efficiency and the batch processing time windows.
To verify the system performance capacity, you must specify metric requirements in the test requirements based on metric definitions.
After the test metrics are obtained, you must specify the relevant prerequisites (such as the workloads and system resources).