All Products
Search
Document Center

Container Service for Kubernetes:Enable monitoring for the Fluid control plane components

更新時間:Aug 03, 2024

Fluid is a Kubernetes-native distributed dataset orchestration and acceleration engine that serves data-intensive applications, such as big data applications and AI applications, in cloud-native scenarios. Fluid provides application-oriented dataset abstraction, a scalable data engine plug-in, automated data operations, data acceleration, and runtime platform agnostic. You can install the Fluid monitoring component on Prometheus instances of Prometheus Service with a few clicks and use the out-of-the-box dashboards provided by Prometheus Service to monitor Fluid. This topic describes how to enable Prometheus Service for Fluid.

Prerequisites

  • Prometheus Service is enabled for your Container Service for Kubernetes (ACK) cluster or ACK Serverless cluster. For more information, see Managed Service for Prometheus.

  • The cloud-native AI suite is deployed and Fluid data acceleration is enabled. The version of the ack-fluid component is 0.9.7 or later. For more information, see Deploy the cloud-native AI suite.

Limits

  • You can install the Fluid monitoring component only on Prometheus instances whose type is Prometheus for Container Service.

  • You can monitor only Fluid control plane components, such as the Fluid controllers and Fluid webhook.

Step 1: Install the Fluid monitoring component

  1. Log on to the ARMS console.

  2. In the left-side navigation pane, click Integration Center. Then, click the Fluid card in the AI section.

    If the Fluid component is already installed, skip the preceding step.

  3. In the Select a Kubernetes cluster section of the Fluid panel, select the Kubernetes cluster.

  4. In the Configuration Information section, configure the parameters and click OK.

    Parameter

    Description

    Exporter Name

    The unique name of the Fluid exporter.

    Metrics scrape interval (seconds)

    The interval at which monitoring data is collected.

    After the Fluid monitoring component is installed, the Fluid card displays Installed 1 Exporter. Click the Fluid card. In the panel that appears, you can view Targets, Metrics, Dashboards, Alerts, Service Discovery Configurations, and Exporter.

Step 2: View the Fluid dashboard

View the Fluid dashboard from the ACK console (recommended)

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, click the ACK cluster or ACK Serverless cluster in which the Fluid monitoring component is installed. In the left-side navigation pane, choose Operations > Prometheus Monitoring.

  3. On the Prometheus Monitoring page, choose Others > Fluid Control Plane to view the monitoring data.

    In the Fluid dashboard, you can view detailed information about the Fluid control plane components, such as the status of the components, the Fluid controller processing time, the QPS of the Fluid webhook, the request processing latency, and the resource usage of each component. For more information, see Panels.

    • In the Component running status section, you can view the number of Fluid control plane pods that are in the Running state, the number of restarts, and the time of each restart. Component status

    • In the Fluid Controller Detailed Indicators section, you can check whether the Fluid controllers are busy and view information about processing failures and Kubernetes API requests. Fluid Controller Detailed Indicators

    • In the Fluid Webhook Detailed Indicators section, you view the resource usage of the Fluid webhook, the number of processed requests, and the request processing latency. Fluid Webhook Detailed Indicators

    • In the Resource usage section, you can view the resource usage of each Fluid control plane component, the network transmit rate, and the network receive rate. Resource usage

View the Fluid dashboard from ARMS

  1. Log on to the ARMS console.

  2. In the left-side navigation pane, click Integration Center. Then, select the Fluid card on the Integration Management page, click the Dashboards tab and click Fluid Control Plane at the bottom of the panel to view the monitoring data.

In the Fluid dashboard, you can view detailed information about the Fluid control plane components, such as the status of the components, the Fluid controller processing time, the QPS of the Fluid webhook, the request processing latency, and the resource usage of each component. For more information, see Panels.

  • In the Component running status section, you can view the number of Fluid control plane pods that are in the Running state, the number of restarts, and the time of each restart.

  • In the Fluid Controller Detailed Indicators section, you can check whether the Fluid controllers are busy and view information about processing failures and Kubernetes API requests.

  • In the Fluid Webhook Detailed Indicators section, you view the resource usage of the Fluid webhook, the number of processed requests, and the request processing latency.

  • In the Resource usage section, you can view the resource usage of each Fluid control plane component, the network transmit rate, and the network receive rate.

Metrics

The following table describes the monitoring metrics for the Fluid control plane components.

Metric

Type

Description

dataset_ufs_total_size

Gauge

The size of datasets that are mounted to the existing Dataset objects in the current cluster.

dataset_ufs_file_num

Gauge

The number of datasets that are mounted to the existing Dataset objects in the current cluster.

runtime_setup_error_total

Counter

The number of failures to start up the runtime when the controller reconciles.

runtime_sync_healthcheck_error_total

Counter

The number of runtime health check failures that occur when the controller reconciles.

controller_runtime_reconcile_time_seconds_bucket

Histogram

The duration of the reconciliation process.

controller_runtime_reconcile_errors_total

Counter

The number of reconciliation failures.

controller_runtime_reconcile_total

Counter

The number of successful reconciliations.

controller_runtime_max_concurrent_reconciles

Gauge

The maximum number of concurrent reconciliations supported by the controller.

controller_runtime_active_workers

Gauge

The number of active reconciliations of the controller.

workqueue_adds_total

Counter

The number of Adds events processed by the controller workqueue.

workqueue_depth

Gauge

The length of the controller workqueue.

workqueue_queue_duration_seconds_bucket

Histogram

The amount of time that the pending object has been waiting in the controller workqueue.

workqueue_work_duration_seconds_bucket

Histogram

The distribution of the durations of the tasks that have been completed by the controller.

workqueue_unfinished_work_seconds

Gauge

The total duration of all tasks that are being processed in the controller workqueue.

workqueue_longest_running_processor_seconds

Gauge

The longest duration that the controller has spent to process a task.

rest_client_requests_total

Counter

The number of HTTP requests calculated based on status codes, methods, and hosts.

rest_client_request_duration_seconds_bucket

Histogram

The HTTP response latency calculated based on Verbs and URLs.

controller_runtime_webhook_requests_in_flight

Gauge

The number of requests that are being processed by the webhook.

controller_runtime_webhook_requests_total

Counter

The total number of requests that are processed by the webhook.

controller_runtime_webhook_latency_seconds_bucket

Histogram

The request processing latency of the webhook.

process_cpu_seconds_total

Counter

The CPU uptime.

process_resident_memory_bytes

Gauge

The memory usage.

References