Metrics for performance tests - Performance Testing - Alibaba Cloud Documentation Center

This topic describes metrics for performance tests.

Purposes and intended readers

The metrics in this topic can be used as criteria for technical quality evaluation of performance test projects, to standardize technical test result evaluation and unify technical test quality measurement. This topic only some major metrics. More metrics may be used in actual technical quality measurement. Intended readers include test administrator, test operators, technical support personnel, project administrator, and technical quality related personnel.

System performance metrics

Response time
1. Definitions
  The response time refers to the time period from when the client initiates a request to when the client receives a response from the server. In performance tests, the time period from when the load pressure is applied to when the tested server returns a result is considered as the response time. It is generally measured in seconds or milliseconds. Average response time refers to the average value of the same transaction when the system is running stably. In general, the average response time is used for transactions. Depending on different transactions, the average response time can be generally divided into complex transaction response time, simple transaction response time, special transaction response time. When you set a special transaction response time, you must clarify the particularity of the transaction in terms of response time.
2. Abbreviation
  Response Time (RT)
3. Standards
  The acceptable response time varies in different industries. In general, the following values are applicable to online real-time transactions:
  - Internet companies: less than 500 milliseconds. For example, Taobao has a response time of 10 milliseconds.
  - Financial companies: less than 1 second is preferred, or less than 3 seconds for complex transactions.
  - Insurance companies: less than 3 seconds.
  - Manufacturing companies: less than 5 seconds.
  For batch processing transactions:
  - Time window: the overall duration of a test process. The time window depends on the amount of data. For example, the time window values vary in Double 11 and 99 Promotion. If a large amount of data is involved, the test can be completed within 2 hours.
System processing capacity
1. Definitions
  System processing capacity refers to the ability to process information using the system hardware and software. It is measured by the number of transactions that the system can process per second. A transaction can be a business process from the operator perspective or a business application and response process from the system perspective. The former is called a business transaction process and the latter is called a transaction. Both metrics can evaluate the processing power of the system. We recommend that you use a metric that is consistent with the system transaction log to facilitate transaction statistics. System processing capacity is an important metric in technical test activities.
2. Abbreviation
  In general, the following metrics can be used to measure system processing capacity:
  - Hits per Second (HPS): the number of clicks per second.
  - Transactions per Second (TPS): the number of transactions processed by the system per second.
  - Queries per Second (QPS): the number of queries processed by the system per second. For Internet business, if only one request connection is established in an application, then TPS, QPS, and HPS are equal. In general, TPS measures the entire business process, QPS measures the number of API queries, and HPS indicates the click requests sent to the server.
3. Standards
  Regardless of TPS, QPS, or HPS, a large value indicates a high system processing capacity. In general, the following values are acceptable:
  - Financial companies: 1000 TPS to 50,000 TPS, excluding Internet-based activities.
  - Insurance companies: 100 TPS to 100,000 TPS, excluding Internet-based activities.
  - Manufacturing companies: 10 TPS to 5,000 TPS.
  - E-commerce companies: 10,000 TPS to 1,000,000 TPS.
  - Medium-sized Internet websites: 1000 TPS to 50,000 TPS.
  - Small Internet websites: 500 TPS to 10,000 TPS.
Concurrent users
1. Definitions
  Concurrent users refer to the users who log on to the system and perform operations at a specified point in time. For a system that uses persistent connections, the maximum number of concurrent users represents the concurrency capability of the system. For a system that uses short-lived connections, the maximum number of concurrent users is not equal to the concurrency capability of the system, which also depends on the system architecture and system processing capacity. For example, if the system provides a high throughput and short-lived connections can be reused, the number of concurrent users is often greater than the number of concurrent connections of the system. The TPS mode (or RPS mode) is suitable for most systems that use short-lived connections. PTS supports tests in RPS mode to facilitate throughput test setup and measurement. In tests, virtual users are used to simulate real-world users to perform operations.
2. Abbreviation
  Virtual User (VU)
3. Standards
  In general, performance tests are performed to measure the system processing capacity instead of the number of concurrent users. Except that the long connections of the server may affect the number of concurrent users, the system processing capacity is independent of the number of concurrent users. In testing the system processing capacity, you can use a small number of concurrent users, or a large number of concurrent users.
Failure ratio
1. Definitions
  Failure ratio refers to the probability of failed transactions when load pressure are applied on the system. Failure rate = (Number of failed transactions/Total number of transactions) × 100%. For a stable system, request failures are caused by timeout and therefore the failure ratio is equal to the timeout rate.
2. Abbreviation
  Failure Ratio (FR)
3. Standards
  Different systems have different requirements for failure ratio. In general, less than 6‰ is acceptable. The success rate must be 99.4% or higher.

Resource metrics

CPU
1. Definitions
  The central processing unit is a very large integrated circuit and is the computing core and control unit of a computer. It is mainly used to interpret computer instructions and process data in computer software. CPU load measures the request queue length in the system and is the average system load.
2. Abbreviation
  Central Processing Unit (CPU)
3. Standards
  CPU utilization is mainly used to measure CPU in four modes: user, sys, wait, and idle. CPU utilization threshold is 75%. CPU sys% threshold is 30% and CPU wait% threshold is 5%. Even single-core CPUs must comply with the preceding requirements. The CPU load must be less than the number of CPU cores.
Memory
1. Definitions
  Memory is one of the important components in the computer and is a bridge to communicate with the CPU. All programs in the computer run in the memory, so the memory has a great impact on the performance of the computer.
2. Abbreviation
  N/A
3. Standards
  To maximize memory usage, cache is added to memory in modern operating systems. Therefore, 100% memory utilization does not necessarily mean memory bottleneck. Swap usage is often used to measure memory bottleneck. In general, the swap usage must be lower than 70%. Otherwise, the system performance may be affected.
Disk throughput
1. Definitions
  Disk throughput is the amount of data that passes through a disk per unit of time without a disk failure.
2. Abbreviation
  N/A
3. Standards
  Disk metrics include IOPS, disk busy percentage, number of disk queues, average service time, average wait time, and disk usage. The disk busy percentage directly reflects whether the disk has a bottleneck. In general, the disk busy percentage must be lower than 70%.
Network throughput
1. Definitions
  Network throughput refers to the amount of data that passes over the network per unit time without a network failure. Unit: bytes/s. Network throughput measures the system requirements for the transmission capacity of network devices or links. When the maximum transmission capacity of network devices or links is approached, you must upgrade network devices.
2. Abbreviation
  N/A
3. Standards
  Mbps is a major metric for network throughput. It generally cannot exceed 70% of the maximum transmission capacity of network devices or links.

Kernel parameters

Operating system kernel parameters mainly include semaphores, processes, and file handles. The following table describes these kernel parameters.

Level-1 metric	Level-2 metric	Unit	Description
Kernel parameters	Maxuprc	N/A	The maximum number of processes per user.
	Max_thread_proc	N/A	The maximum number of threads available for each process.
	Filecache_max	Bytes	The maximum physical memory available for cache file I/O.
	Ninode	N/A	The maximum number of inodes in memory available for the HFS.
	Nkthread	N/A	The maximum number of threads that are running simultaneously.
	Nproc	N/A	The maximum number of processes that are running simultaneously.
	Nstrpty	N/A	The maximum number of pseudo terminal slaves (PTSs) based on streams.
	Maxdsiz	Bytes	The maximum data size (in bytes) per process.
	maxdsiz_64bit	Bytes	The maximum data size (in bytes) per process.
	maxfiles_lim	N/A	The maximum number of file descriptors per process
	maxssiz_64bit	Bytes	The maximum stack size per process.
	Maxtsiz	Bytes	The maximum text size per process.
	nflocks	N/A	The maximum number of file locks.
	maxtsiz_64bit	Bytes	The maximum text size per process.
	msgmni	N/A	The maximum number of system v IPC message queues (IDs).
	msgtql	N/A	The maximum number of system v IPC messages.
	npty	N/A	The maximum number of BSD pseudo TTYs (PTYs).
	nstrtel	N/A	The number of telnet device files supported by the kernel.
	nswapdev	N/A	The maximum number of swap devices.
	nswapfs	N/A	The maximum number of swap file systems.
	semmni	N/A	The number of System V IPC semaphore IDs.
	semmns	N/A	The number of System V IPC semaphores.
	shmmax	Bytes	The maximum size of system v shared memory.
	shmmni	N/A	The number of system v shared memory IDs.
	shmseg	N/A	The maximum size of system v shared memory per process.

Middleware metrics

Definitions

Common metrics for middleware services (such as Tomcat and Weblogic) include JVM, ThreadPool, and JDBC. The following table describes these metrics.

Level-1 metric	Level-2 metric	Unit	Description
GC	GC frequency	N/A	The partial garbage collection frequency of a Java virtual machine.
	Full GC frequency	N/A	The full garbage collection frequency of a Java virtual machine.
	Average full GC duration	Seconds	The average duration for full garbage collection.
	Maximum full GC duration	Seconds	The maximum duration for full garbage collection.
	Heap usage	%	The usage of heap.
Thread pool	Active thread count	N/A	The number of active threads.
Thread pool	Pending user request	N/A	The number of user requests that are enqueued.
JDBC	JDBC active connection	N/A	The number of JDBC active connections.

Standards
- The number of active threads cannot exceed the specified maximum value. In general, when the system performance is good, set the minimum value to 50 and the maximum value to 200.
- The number of JDBC active connections cannot exceed the specified maximum value. In general, when the system performance is good, set the minimum value to 50 and the maximum value to 200.
- The GC frequency, especially the full GC frequency, cannot be very high. In general, when the system performance is good, set the minimum JVM heap size and maximum JVM heap size to 1024 M.

Database metrics

Definitions

Common metrics for databases (such as ＭySQL) include SQL, throughput, cache hit ratio, and connections. The following table describes these metrics.

Level-1 metric	Level-2 metric	Unit	Description
SQL	Duration	Microseconds	The duration to execute a SQL statement.
Throughput	QPS	N/A	The number of queries per second.
Throughput	TPS	N/A	The number of transactions per second.
Hit ratio	Key buffer hit ratio	%	The hit ratio of the index buffer.
	InnoDB buffer hit ratio	%	The hit ratio of the InnoDB buffer.
	Query cache hit ratio	%	The hit ratio of the query cache.
	Table cache hit ratio	%	The hit ratio of the table cache.
	Thread cache hit ratio	%	The hit ratio of the thread cache.
Lock	Waits	N/A	The number of lock waits.
Lock	Waiting time	Microseconds	The lock waiting time.

Standards
- A small SQL duration is preferred. Generally, it is in microseconds.
- A high hit ratio is preferred. Generally, it cannot be lower than 95%.
- Small values are preferred for the number of lock waits and the lock waiting time.

Frontend metrics

Definitions

Common frontend metrics include the time to display pages and the time to display network. The following table describes these metrics.

Level-1 metric	Level-2 metric	Unit	Description
Page display	First contentful paint	Milliseconds	The time it takes for the first piece of content to appear on a webpage after you enter a URL in the address bar.
	OnLoad event time	Milliseconds	The time for the browser to trigger an onLoad event. This event is triggered when the original document and all referenced content are completely downloaded.
	Time to fully loaded	Milliseconds	The time to complete all onLoad JavaScript programs and trigger all dynamic or lazy-loaded content by those programs.
Pages	Page size	KB	The size of the entire page.
Pages	Requests	N/A	The total number of all network requests when you download resources from a website. A small value is preferred.
Network	DNS time	Milliseconds	The DNS lookup time.
	Connection time	Milliseconds	The time to establish a TCP/IP connection between the browser and Web server.
	Server time	Milliseconds	The processing time by the server.
	Transmission time	Milliseconds	The time to transmit the content.
	Waiting time	Milliseconds	The time to wait for a resource to be released.

Standards
- A small page size is preferred and compression can be used.
- Small page display time values are preferred.

Stability metrics

Definitions
Minimum stable time: The minimum time that the system can run stably under the condition of 80% of the maximum capacity or standard load pressure (expected daily pressure). Generally, for a system running on a working day (8 hours), it must run stably for at least 8 hours. For a 24/7 operating system, it must run stably for at least 24 hours. If the system cannot run stably, performance degradation or even crash may occur as business workloads and running time increase.
Standards
- The TPS curve remains flat without large-scale fluctuations.
- For the resource metrics, no leaks or exceptions occurs.

Batch processing metrics

Definitions
The amount of data processed per unit of time by the batch processing program. It is generally measured by the amount of data processed per second. Processing efficiency is the most important metric for estimating batch processing time windows. The start time and end time of batch processing time windows of different systems may overlap. A system may have multiple batch processes that are performed simultaneously, and their time windows can overlap. Long-period batch processing tasks affect the performance of online real-time trading.
Standards
- If a large amount of data is involved, a short batch processing time window is preferred.
- The performance of online real-time trading cannot be affected.

Scalability metrics

Definitions
The ratio between hardware resource increase and performance increase when application programs or operating systems are deployed in cluster mode. The formula: (Performance increase/Original performance)/(Resource increase/Original resources) × 100%. The trends of scalability metrics can be obtained after multiple tests. If an application system has high scalability, its scalability metric values must be linear or near-linear. Large-scale distributed systems often have high scalability.
Standards
- In ideal cases, resource increase and performance increase are in a linear relationship.
- The performance increase must be 70% or higher.

Reliability metrics

Hot standby deployment
The hot standby deployment is used to ensure high reliability. The following metrics can be used to measure reliability:
- Whether the node switchover is successful and the actual time consumed.
- Whether the business is interrupted during the node switchover.
- Whether the node switchback is successful and the actual time consumed.
- Whether the business is interrupted during the node switchback.
- How much data is lost during the node switchback. In the hot standby test, you can use the pressure generation tool to apply performance pressure on the system, ensuring that the test results are consistent with actual production conditions.
Cluster architecture
The following metrics can be used to measure the cluster reliability for a system that uses the cluster architecture:
- Whether the business is interrupted when a node in the cluster is faulty.
- Whether the system needs to be restarted when a new node is added to the cluster.
- Whether the system needs to be restarted when the faulty node is recovered and then added to the cluster.
- Whether the business is interrupted when the faulty node is recovered and then added to the cluster.
- How long does the node switchover take. In the cluster reliability test, you can use the pressure generation tool to apply performance pressure on the system, ensuring that the test results are consistent with actual production conditions.
Backup and restoration
This item verifies whether the system backup and recovery mechanism is effective and reliable, including system backup and recovery, database backup and recovery, and application backup and recovery. The following metrics can be used to measure backup and recovery reliability:
- Whether the backup is successful and the actual time consumed.
- Whether the backup is automated by using a script.
- Whether the restoration is successful and the actual time consumed.
- Whether the restoration is automated by using a script.
- The selection and verification of metrics depends on the test purpose and test requirements for the system. Different metrics can be used for different systems, different test purposes, or different test requirements.
- If some systems require additional front-end user access capabilities, you must add the user access concurrency metrics.
- To verify the batch processing performance, use the batch processing efficiency and the batch processing time windows.
- To verify the system performance capacity, you must specify metric requirements in the test requirements based on metric definitions.
- After the test metrics are obtained, you must specify the relevant prerequisites (such as the workloads and system resources).