All Products
Search
Document Center

ApsaraDB for MongoDB:Troubleshoot high CPU utilization issues on an ApsaraDB for MongoDB instance

Last Updated:May 14, 2024

CPU utilization is a key metric that is used to monitor an ApsaraDB for MongoDB instance. If an ApsaraDB for MongoDB instance experiences high CPU utilization, the instance becomes unavailable. This topic describes how to view the CPU utilization on an ApsaraDB for MongoDB instance and troubleshoot high CPU utilization issues.

View CPU utilization on an instance

For a sharded cluster instance, the CPU utilization on each shard node is the same as that on a replica set instance. ConfigServer nodes are immune to most CPU bottlenecks because it only stores configuration metadata. The CPU utilization on mongos nodes is affected by aggregate result sets and the number of concurrent requests.

For a replica set instance, you can use the following methods to view the CPU utilization.

  • View CPU utilization in monitoring charts

    A replica set instance consists of multiple node roles. Each node role can correspond to one or more physical nodes. ApsaraDB for MongoDB allows you to use primary, secondary, and read-only nodes.

    On the Monitoring Data page of a replica set instance in the ApsaraDB for MongoDB console, select a node role and view the CPU utilization on the corresponding node in monitoring charts.

    Note

    CPU utilization is affected by instance specifications. For example, if an instance is equipped with 8 CPU cores and 16 GB of memory and has a CPU utilization of 100%, the 8 CPU cores are exhausted. In this example, the CPU utilization is displayed as 100% instead of 800%.

  • View and terminate active sessions

    The surging sessions of an instance in the Running state may consume 100% of CPU resources. In this case, high CPU utilization may be caused by the changes in business traffic. Other possible causes include scans involving large numbers of documents, data sorting and aggregation, and surges in business traffic. You can use one of the following methods to view the active sessions:

    • In the ApsaraDB for MongoDB console, click the ID of an instance. In the left-side navigation pane, choose CloudDBA > Sessions. On the page that appears, view current active sessions and analyze the query operations that are not completed within the expected execution period.

    • To view and analyze the details of active sessions, run the db.currentOp() command provided by MongoDB. If necessary, run the db.killOp() command to actively terminate slow queries that are not completed within the expected execution period. For more information, see db.currentOp() and db.killOp().

  • Record slow query logs and audit logs

    ApsaraDB for MongoDB allows you to use the following profiling levels:

    • Profiling is disabled and no data is collected.

    • Profiling is enabled for all requests. The execution data of all requests is recorded in the system.profile collection.

    • Profiling is enabled for slow queries. Queries that take longer than the specified threshold are recorded in the system.profile collection.

    In the ApsaraDB for MongoDB console, click the ID of an instance. In the left-side navigation pane of the page that appears, click Parameters > Parameter List tab, set the operationProfiling.mode and operationProfiling.slowOpThresholdMs parameters. The operationProfiling.mode parameter specifies the profiling level. The operationProfiling.slowOpThresholdMs parameter specifies the threshold of slow queries.

    Choose Logs > Slow Query Logs. On the page that appears, you can view slow query logs after you enable profiling.

    Important
    • The retention period for slow query logs is 72 hours.

    • If the instance was purchased after June 6, 2021 and you want to view the slow query logs of the instance, you must enable the audit log feature and select the admin and slow operation types that you want to audit.

      • You can view only slow query logs that are generated after the audit log feature is enabled.

      • For more information about how to enable the audit log feature, see Enable the log audit feature.

      • For more information about how to configure audit operation types, see Modify audit log settings.

    For more information about profiling, visit https://docs.mongodb.com/manual/tutorial/manage-the-database-profiler/.

    If you require more detailed auditing to troubleshoot problematic requests, you can enable the audit log feature in the ApsaraDB for MongoDB console. In the left-side navigation pane, choose Data Security > Audit Logs. On the page that appears, click Enable Audit Logs.

    For more information, see Enable the log audit feature.

Possible causes of high CPU utilization and optimization policies

This section describes the common causes of high CPU utilization on an instance and the corresponding optimization policies.

  • Queries that need to scan large numbers of documents

    ApsaraDB for MongoDB supports multi-threading. If a single query needs to scan large numbers of documents, the thread in which the query is executed occupies CPU resources for a longer period of time. If the pending requests or the queries that need to scan large numbers of documents are in high concurrency, high CPU utilization occurs on the instance on which the queries are executed. The CPU utilization on an instance is positively related to the total number of scanned documents that the instance requires.

    Index optimization is the best solution to reduce the number of documents that a single query needs to scan. In the underlying architecture, MongoDB uses an index design similar to MySQL and provides richer categories and features than MySQL. Therefore, most index optimization policies that apply to MySQL also apply to MongoDB.

    Queries that often need to scan large numbers of documents are common in the following scenarios:

    • Scan a full table

      If the COLLSCAN keyword exists in the system.profile collection or runtime logs, a query results in a full table scan. You can optimize this query by adding indexes. If this method is unavailable, you must control the data volume of the table and the query execution frequency.

      For more information about how to query execution plans, see Explain Results and Cursor Methods.

    • Index design and optimization

      In addition to full table scan, you must also focus on queries that are frequently executed and have a docsExamined parameter value greater than 1,000. The docsExamined parameter specifies the number of documents that MongoDB examines. In addition to full table scan, the large number of examined documents may be caused by the following reasons:

      • When multiple filter conditions are used, a compound index is not used or the principle of leftmost prefix matching is not satisfied.

      • Indexes are not used for sorting.

      • The query is complex or involves large numbers of aggregate operations, which can result in invalid parsing policies or indexes that cannot be optimized.

      • The data selectivity of a data field is unbalanced with the selection frequency.

      For more information, see the following references:

  • Excessive concurrency

    In addition to query causes, high CPU utilization may be caused by excessive business concurrency. If the number of business requests is large and concurrency is high, you can use the following methods to add CPU cores:

    • Scale up a single instance for more read and write workloads.

    • Configure read/write splitting for a replica set instance or add read-only nodes to the replica set instance.

    • Upgrade the problematic instance to a sharded cluster instance for linear scale-out.

    • If the CPU resources of the mongos node are exhausted, add mongos nodes and configure load balancing for the nodes. For more information, see Introduction to ApsaraDB for MongoDB sharded cluster instances.

    For more information, see Overview, Change the configurations of a standalone instance, and Change the configurations of a replica set instance.

Other possible causes

  • Frequent short-lived connections of MongoDB

    In versions later than MongoDB 3.X, the default identity authentication mechanism is SCRAM-SHA1 that requires CPU-intensive operations such as hash calculation. If short-lived connections are in high concurrency, hash calculation consumes multifold CPU resources and even exhausts CPU resources. In this case, runtime logs contain a large number of saslStart error messages.

    We recommend that you use persistent connections. To optimize PHP short-lived connections in high concurrency scenarios, ApsaraDB for MongoDB optimizes the method to rewrite built-in random functions at the kernel layer. This helps reduce the CPU utilization on an instance.

    On the Database Connections page of an instance in the ApsaraDB for MongoDB console, enable password-free access.

  • Time-to-live (TTL) indexes cause higher CPU utilization on a secondary node than the primary node.

    MongoDB 3.2 and later support multi-threaded replication. The rollback concurrency of oplogs is determined by the replWriterThreadCount parameter. The default value of this parameter is 16. Secondary nodes do not handle business-critical read workloads. However, the CPU utilization on a secondary node may be higher than that on the primary node. For example, if you use TTL to delete data in a table when the data expires, the system can efficiently delete large amounts of data at a time based on the indexes on the time column. The system transforms the delete operation into multiple delete operations and then sends the operations to a secondary node. On the secondary node, the rollback of oplogs is less efficient. Multi-threaded rollback of oplogs may increase CPU utilization on the node. In this case, we recommend that you ignore high CPU utilization on the node.