Troubleshoot the issue that WAL logs are accumulated in PostgreSQL

0.0.201

During peak hours, an excessively large number of WAL logs are generated in PostgreSQL. WAL is short for write-ahead logging. The checkpointer process deletes expired WAL logs at regular intervals. However, in production environments, WAL logs may fail to be deleted due to inappropriate operations. In this case, a large amount of storage is occupied. This topic describes how to troubleshoot the issue that WAL logs are accumulated in the primary database or read-only databases.

Background information

WAL is a key component of PostgreSQL to ensure data security and improve system reliability and performance. WAL helps prevent data loss and ensure that data can be restored in a reliable manner even if multiple faults occur.

Log accumulation in the primary database

If WAL logs are accumulated in the primary database, you can identify and troubleshoot the issue from the following aspects.

Inactive replication slots or unreported LSN from the consumer

A replication slot is a key tool in PostgreSQL to implement high availability and disaster recovery. You can use replication slots to prevent WAL logs from being deleted and prevent replication interruptions. However, if a replication slot is inactive, WAL logs associated with the slot are not deleted and keep accumulating.
If LSN is not reported by the consumer in a timely manner, the size of WAL logs keeps increasing.

You can use the pg_replication_slots system view to view the information about replication slots, such as LSN. For example, you can use the following SQL statements to query the size of WAL logs that are accumulated for a replication slot. The accumulation causes replication latency between the primary and secondary databases. In this case, the accumulated WAL logs must be deleted.

SELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_insert_lsn(), restart_lsn)) AS delay_size
FROM pg_replication_slots;

If the size displayed in the query result is large or equal to the number of accumulated WAL logs, evaluate and delete the replication slot based on your business requirements.

Incorrect parameter settings

If the values of the wal_keep_segments, wal_keep_size, and max_wal_size parameters are excessively large, a large number of WAL logs are accumulated. Check and modify the values of these parameters based on your business requirements.

VACUUM storm and a large number of writes

In most cases, a VACUUM storm refers to a large number of automatic or manual VACUUM operations that are performed on a database at the same time. When these operations are performed, a large number of WAL logs may be generated. This causes a significant increase in I/O loads and affects the database performance. In addition, WAL logs may not be deleted in a timely manner. We recommend that you configure parameters in the VACUUM statement and schedule the execution of the VACUUM statements based on your business requirements.

The scenario in which a large number of writes occur is similar to a VACUUM storm. You must schedule data writes based on your business requirements.

Log accumulation in read-only databases

If WAL logs are accumulated in a read-only database, you can identify and troubleshoot the issue from the following aspects.

Replication latency

The following list describes the possible causes of the common issue that latency occurs when WAL logs are replayed on a read-only instance.

A long-running transaction exists on the read-only database and conflicts with the replay of the WAL logs. In this case, you must evaluate and change the values of the hot_standby_feedback and max_standby_streaming_delay parameters based on your business requirements.
For example, if the hot_standby_feedback parameter is set to off and the max_standby_streaming_delay parameter is set to a large value, the execution of a long-running transaction on the read-only database may cause replay latency.
The specifications of the primary and read-only databases are different. If the computing and storage specifications of the read-only database are lower than those of the primary database, replication latency may occur. As a result, the WAL logs that are not replayed cannot be deleted. In this case, you must evaluate and select appropriate specifications for the read-only database based on your business requirements.

What to do next

For ApsaraDB RDS for PostgreSQL instances, if the issue that WAL logs are accumulated persists after you perform the preceding troubleshooting operations, contact the technical support of ApsaraDB RDS for PostgreSQL.

References

You can manually delete inactive replication slots to allow AliPG to automatically delete WAL logs. For more information, see Use the WAL log management feature.

Feedback

Previous: Locate SQL statements with the highest resource consumptionNext: Insert, update, and delete multiple data records at a time

On this page （1）

Background information

Log accumulation in the primary database

Inactive replication slots or unreported LSN from the consumer

Incorrect parameter settings

VACUUM storm and a large number of writes

Log accumulation in read-only databases

Replication latency

What to do next

References

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

Background information

Log accumulation in the primary database

Inactive replication slots or unreported LSN from the consumer

Incorrect parameter settings

VACUUM storm and a large number of writes

Log accumulation in read-only databases

Replication latency

What to do next

References

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

China Gateway Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)