Exception handling

Updated at: 2025-03-20 10:09

The exception handling feature displays the abnormal events that have occurred or are occurring in the database cluster in the last 3 days. You can use this feature to quickly obtain the health status of the cluster and, when abnormal events occur, perform root cause analysis to pinpoint the cause of the problem.

View the performance monitoring data

  1. Log in to the ApsaraDB for OceanBase console.

  2. In the left-side navigation pane, choose Autonomy Service > Diagnostics Center.

  3. In the Instance Details section, click the name of the target instance.

    The system automatically redirects to the diagnostics center.

  4. In the left-side navigation pane, click Exception Handling.

  5. In the Performance Monitoring section, view the data of performance monitoring metrics, such as CPU Percent, CPU Utilization, and Queue Time.

    By default, the system displays the data within the last 3 days.

  6. In the time selector in the top right corner of the page, you can also click 3d and choose Last Hour, Last 6 Hours, Last Day, Last 3 Days, or Custom Time from the drop-down list.image

  7. Hover the cursor over the question mark icon image.png to view the description of a performance monitoring metric.image

  8. Click the icon image next to the tenant name to view the performance monitoring data of the tenant. For example, the following figure shows the queue time of the forMySQLTenant tenant.image

  9. Click the icon image.png to view the breakdown of a performance monitoring metric.image

  10. Place the cursor at a certain time point to view the data at that time point.image

View the abnormal events

  1. Log in to the ApsaraDB for OceanBase console.

  2. In the left-side navigation pane, choose Autonomy Service > Diagnostics Center.

  3. In the Instance Details section, click the name of the target instance.

    The system automatically redirects to the diagnostics center.

  4. In the left-side navigation pane, click Exception Handling.

  5. In the Abnormal Events section, view abnormal events of the target object, including Object, Exception Type, Abnormal Performance, Current Status, Occurrence Time, Recovery Time, Duration, and Operation.

  6. Click the Root Cause Analysis in the Operation column of a single abnormal event to view the root cause analysis and optimization suggestions for the abnormal event.

    Note

    You can click View Smart Interpretation to view the diagnostic results and suggestions provided by AI. The content of the AI smart interpretation is for informational purposes only.

    • If the cause of the abnormal event is in the analysis graph, the system will highlight the cause in red and provide optimization suggestions.

      Note

      In the analysis graph, each node represents an analysis rule. When performing root cause analysis, the system traverses the graph to find the root cause node. The root cause node is highlighted in red, while the green node indicates that the rule does not hit the root cause.

      The following is an example:

      Upon detecting Tenant Queue Waiting Becomes Longer within the specified time period, the system provides a prompt that the CPU usage is too high. In the Analysis Path section, you can click the red highlighted box to view the corresponding root cause analysis.

      In the SQL Summary Information section, the system displays SQL Summary Time Period, Total Executions, Total Number of Error Executions, Maximum Elapsed Time (ms), CPU Time (ms), and Plan Generation Time (ms) by default. You can view more information by clicking Manage Columns.

      In the Possible Root Cause SQL section, you can view the SQL that may cause the problem and click View SQL Details in the Actions column.

    • If the cause of the abnormal event is not in the analysis graph, the system will provide optimization suggestions in the Solution section.

      The following is an example:

      Upon detecting Tenant CPU Exception, the system will still display the analysis graph and provide optimization suggestions in the Solution section.

Enable system autonomy

OAS currently provides two background tasks:

  • Abnormal Event Analysis: When the system detects an abnormal event, it automatically analyzes the SQLs related to that event.

  • Regular SQL Inspection: Regularly inspects SQLs in the cluster to identify suspicious SQLs.

When the system analyzes SQLs associated with an abnormal event, or detects deterioration in SQL execution plans during daily inspections, OAS can automatically perform the following actions:

  • Automatic Plan Cache Refresh: Clears the SQL execution plan cache, allowing the optimizer to regenerate an execution plan.

  • Automatic Outline Binding: Based on the performance statistics of the historical execution plans, OAS automatically binds a historically lower CPU time execution plan to an SQL with deteriorated execution plans.

Procedure

  1. Log in to the ApsaraDB for OceanBase console.

  2. In the left-side navigation pane, choose Autonomy Service > Diagnostics Center.

  3. In the Instance Details section, click the name of the target instance.

    The system automatically redirects to the diagnostics center.

  4. In the left-side navigation pane, click Exception Handling.

  5. In the upper right corner of the page, click Autonomy Settings.

  6. In the pop-up window, turn on the feature toggle and make the following configurations:

    • Automatic Plan Cache Refresh: Enable this feature and set the execution timing.

    • Automatic Outline Binding: Enable this feature and set the execution timing.

    • Objects with Ineffective Settings (optional): Set the database, tenant, or SQL blocklist. The system will automatically ignore these objects during self-healing.

    • Notification Settings: Configure the Webhook address for DingTalk group notifications as follows:

      1. Create a bot in the DingTalk group and copy the Webhook address link.

      2. Enter the copied link in the text box and click Verify. OAS will send a verification code to the DingTalk group.

      3. Input the verification code to complete the verification.

    image

    Note
    • After enabling the system autonomy feature, you can query the records for automatic refresh and automatic binding at Autonomy Service > Diagnostics Center > Optimization Management > Optimization Records. The source of the relevant records will be: System Autonomy.

    • Currently, the self-healing feature only supports SQLs of the SELECT type.

Automatic Plan Cache Refresh

  • During SQL inspection, the Plan Cache is automatically refreshed only for SQLs that exhibit execution plan deterioration.

  • During root cause analysis, the Plan Cache is automatically refreshed for all SQLs associated with the abnormal event.

Automatic Outline Binding

  1. Pre-Binding Check:

    • The system will attempt to refresh the execution plan of the SQL and observe the newly generated execution plan.

    • If the CPU time of the new execution plan is 20% lower than that of the historical execution plan, it is considered a successful self-heal, and the outline will not be bound.

    • If a better execution plan is not generated, the binding operation will be executed, and an alert notification will be sent to the specified DingTalk group.

  2. Post-Binding Observation:

    • After the binding is completed, the system will continue to monitor the execution plan of the SQL.

    • If no better execution plan is generated after the binding, the binding operation will be rolled back, and an alert notification will be sent to the specified DingTalk group.

  3. Notes:

    • Based on the performance statistics of historical execution plans, the automatic outline binding feature binds a historically lower CPU time execution plan to an SQL with deteriorated execution plans.

    • In scenarios with a mix of large and small accounts, the binding operation does not guarantee an improvement in SQL performance; therefore, please closely monitor self-healing alert notifications and promptly confirm the binding effects.

    • For clusters with the kernel SPM (SQL Plan Management) enabled, it is recommended not to enable automatic outline binding to avoid conflicts with SPM functionality.

  • On this page (0)
  • View the performance monitoring data
  • View the abnormal events
  • Enable system autonomy
  • Procedure
  • Automatic Plan Cache Refresh
  • Automatic Outline Binding
Feedback
phone Contact Us

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

alicare alicarealicarealicare