Fluid Helps Improve Data Elasticity with Customized Auto Scaling

By Che Yang, Maintainer of the Fluid Community and Xie Yuandong, Committer of the Fluid Community

Background

Today, more data-intensive applications, such as big data and AI, are being deployed and run in Kubernetes. With this trend, the divergence between the design concepts of data-intensive application computing frameworks and cloud-native and flexible application orchestration has led to data access and computational bottlenecks.

Fluid, the cloud-native data orchestration engine provides data access acceleration for applications through the abstraction of data sets with a distributed cache and the scheduler.

As one of the core features of Kubernetes, auto scaling always focuses on stateless application loads. Fluid provides the auto scaling capability with a distributed cache, which allows the data cache to expand and shrink flexibly. Based on runtime, Fluid supports performance metrics, such as cache space and the proportion of existing caches. Moreover, it provides data cache and on-demand scaling in combination with its capability to scale runtime resources.

This capability is very important for big data applications under Internet scenarios because most big data applications are implemented through end-to-end pipelines. The pipeline contains the steps below:

Data Ingest: Pre-process raw data with big data technologies, such as Spark and MapReduce
Model Training: Train machine learning models with the first stage of feature data and then corresponding models will be generated
Model Evaluation: Evaluate and test the model generated in the second stage with a test set or validation set.
Model Inference: Push the model online to provide inference services for the business after the third stage of validation

There are different types of computing tasks in the end-to-end pipeline. In practice, each computing task is processed with a professional system, including TensorFlow, PyTorch, Spark, and Presto. However, these systems are independent of each other. Therefore, an external file system is often used to transfer data from one stage to the next. The frequent use of file systems for data exchange results in significant input/output (I/O) overhead, which often becomes a workflow bottleneck.

Fluid is very suitable for the scenario mentioned above. Users can create a dataset, which can distribute data to Kubernetes compute nodes as a medium for data exchange, avoiding remote data writing and reading and improving data usage efficiency. The problem here is the resource estimation and reservation of the temporary data cache. Before data is produced and consumed, it is difficult to estimate the data size accurately. A higher estimate leads to a waste of resource reservation, whereas a lower estimate increases the possibility of data write failures. Therefore, scaling on demand is more user-friendly. Hopefully, a page cache-like effect that is transparent to the end users can be achieved, but the cache acceleration it brings is real.

The cache auto scaling was introduced through Fluid by customizing the horizontal pod autoscaler (HPA) mechanism. The condition for auto scaling is that when the amount of existing cache data reaches a certain proportion, the auto scaling will be triggered to expand the cache space. For example, set the trigger condition to the cache space accounted for more than 75%. In this case, the total cache space is 10 GB. When the data has filled up to 8 GB, the expansion mechanism will be triggered.

The following example shows the auto scaling of Fluid.

Prerequisites

Kubernetes 1.18 or later is recommended. HPA cannot customize the scaling policy before version 1.18, which is implemented through hard coding. However, after version 1.18, users can customize the scaling policy, for example, defining the cooldown after a scale-up.

Procedures

1. Install jq to parse the json object. This example uses the CentOS operating system. Users can run the yum command to install jq.


yum install -y jq

2. Download and install the latest version of Fluid

git clone https://github.com/fluid-cloudnative/fluid.git
cd fluid/charts
kubectl create ns fluid-system
helm install fluid fluid

3. Deploy or configure Prometheus

Metrics exposed by AlluxioRuntime's cache engine are collected here via Prometheus. If Prometheus does not exist in the cluster:

$ cd fluid
$ kubectl apply -f integration/prometheus/prometheus.yaml

If the Prometheus exists in the cluster, write the following configuration into the Prometheus configuration file:

scrape_configs:
  - job_name: 'alluxio runtime'
    metrics_path: /metrics/prometheus
    kubernetes_sd_configs:
      - role: endpoints
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_label_monitor]
      regex: alluxio_runtime_metrics
      action: keep
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      regex: web
      action: keep
    - source_labels: [__meta_kubernetes_namespace]
      target_label: namespace
      replacement: $1
      action: replace
    - source_labels: [__meta_kubernetes_service_label_release]
      target_label: fluid_runtime
      replacement: $1
      action: replace
    - source_labels: [__meta_kubernetes_endpoint_address_target_name]
      target_label: pod
      replacement: $1
      action: replace

4. Verify whether Prometheus is installed

$ kubectl get ep -n kube-system  prometheus-svc
NAME             ENDPOINTS        AGE
prometheus-svc   10.76.0.2:9090   6m49s
$ kubectl get svc -n kube-system prometheus-svc
NAME             TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
prometheus-svc   NodePort   172.16.135.24   <none>        9090:32114/TCP   2m7s

Install Grafana to visualize monitoring metrics and verify the monitoring data. For more information, please see the document (article in Chinese).

5. Deploy the metrics server

Check whether the cluster includes a metrics server. If the kubectl top node has the correct output for memory and CPU, the cluster metrics server is configured correctly.

kubectl top node
NAME                       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
192.168.1.204   93m          2%     1455Mi          10%
192.168.1.205   125m         3%     1925Mi          13%
192.168.1.206   96m          2%     1689Mi          11%

Otherwise, manually run the following command:

kubectl create -f integration/metrics-server

6. Deploy custom-metrics-api components

Two components are needed to scale based on custom metrics. The first component collects metrics from the application and stores them in the Prometheus time-series database. The other one extends the custom metrics API of Kubernetes with a collection of metrics, the k8s-prometheus-adapter. The first component is deployed in step three. Now, the second component will be deployed like this:

If custom-metrics-api is already configured, add dataset-related configurations to the adapter's ConfigMap configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: adapter-config
  namespace: monitoring
data:
  config.yaml: |
    rules:
    - seriesQuery: '{__name__=~"Cluster_(CapacityTotal|CapacityUsed)",fluid_runtime!="",instance!="",job="alluxio runtime",namespace!="",pod!=""}'
      seriesFilters:
      - is: ^Cluster_(CapacityTotal|CapacityUsed)$
      resources:
        overrides:
          namespace:
            resource: namespace
          pod:
            resource: pods
          fluid_runtime:
            resource: datasets
      name:
        matches: "^(.*)"
        as: "capacity_used_rate"
      metricsQuery: ceil(Cluster_CapacityUsed{<<.LabelMatchers>>}*100/(Cluster_CapacityTotal{<<.LabelMatchers>>}))

Otherwise, manually run the following command:

kubectl create -f integration/custom-metrics-api/namespace.yaml
kubectl create -f integration/custom-metrics-api

Note: Since the custom-metrics-api docks to the Prometheus access address in the cluster, please replace the Prometheus URL with the Prometheus address you actually use.

Check custom metrics

$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "custom.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "pods/capacity_used_rate",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "datasets.data.fluid.io/capacity_used_rate",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "namespaces/capacity_used_rate",
      "singularName": "",
      "namespaced": false,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    }
  ]
}

7. Submit the dataset used for the test.

$ cat<<EOF >dataset.yaml
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: spark
spec:
  mounts:
    - mountPoint: https://mirrors.bit.edu.cn/apache/spark/
      name: spark
---
apiVersion: data.fluid.io/v1alpha1
kind: AlluxioRuntime
metadata:
  name: spark
spec:
  replicas: 1
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        quota: 1Gi
        high: "0.99"
        low: "0.7"
  properties:
    alluxio.user.streaming.data.timeout: 300sec
EOF
$ kubectl create -f dataset.yaml
dataset.data.fluid.io/spark created
alluxioruntime.data.fluid.io/spark created

8. Check whether this dataset is in the available state. The total amount of data in this dataset is 2.71 GiB, and the maximum caching capacity provided by Fluid is 1 GiB, with the current number of cache nodes being 1. The amount of data cannot meet the demand of a full data cache.

$ kubectl get dataset
NAME    UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
spark   2.71GiB          0.00B    1.00GiB          0.0%                Bound   7m38s

9. When the dataset is in the available status, check whether the metrics can be obtained from the custom-metrics-api.

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/datasets.data.fluid.io/*/capacity_used_rate" | jq
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/datasets.data.fluid.io/%2A/capacity_used_rate"
  },
  "items": [
    {
      "describedObject": {
        "kind": "Dataset",
        "namespace": "default",
        "name": "spark",
        "apiVersion": "data.fluid.io/v1alpha1"
      },
      "metricName": "capacity_used_rate",
      "timestamp": "2021-04-04T07:24:52Z",
      "value": "0"
    }
  ]
}

10. Create an HPA task

$ cat<<EOF > hpa.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: spark
spec:
  scaleTargetRef:
    apiVersion: data.fluid.io/v1alpha1
    kind: AlluxioRuntime
    name: spark
  minReplicas: 1
  maxReplicas: 4
  metrics:
  - type: Object
    object:
      metric:
        name: capacity_used_rate
      describedObject:
        apiVersion: data.fluid.io/v1alpha1
        kind: Dataset
        name: spark
      target:
        type: Value
        value: "90"
  behavior:
    scaleUp:
      policies:
      - type: Pods
        value: 2
        periodSeconds: 600
    scaleDown:
      selectPolicy: Disabled
EOF

In the sample configuration, there are two main parts. One is the scaling rules, and another is the scaling sensitivity.

Rules: The condition for triggering the scale-up is that the amount of cached data in the dataset is 90% of the total cache capacity. The scale-up object is AlluxioRuntime, and the minimum and the maximum number of replicas are 1 and 4, respectively. The dataset and AlluxioRuntime objects must be in the same namespace.
Policies: For Kubernetes 1.18 or later, the stabilization time and the ratio of one scaling step can be set for the scale-up and scale-down scenarios, respectively. In this example, the periodSeconds field is set to 10 minutes, and two replicas are added during the scaling. However, this cannot exceed the limit of max replicas. After the scale-up, the stabilizationWindowSeconds field is set to 20 minutes, while after scale-down can choose to close it directly.

11. View the HPA configuration. The current proportion of cache space is 0, far below the condition for triggering the scale-up.

$ kubectl get hpa
NAME    REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
spark   AlluxioRuntime/spark   0/90      1         4         1          33s
$ kubectl describe hpa
Name:                                                    spark
Namespace:                                               default
Labels:                                                  <none>
Annotations:                                             <none>
CreationTimestamp:                                       Wed, 07 Apr 2021 17:36:39 +0800
Reference:                                               AlluxioRuntime/spark
Metrics:                                                 ( current / target )
  "capacity_used_rate" on Dataset/spark (target value):  0 / 90
Min replicas:                                            1
Max replicas:                                            4
Behavior:
  Scale Up:
    Stabilization Window: 0 seconds
    Select Policy: Max
    Policies:
      - Type: Pods  Value: 2  Period: 600 seconds
  Scale Down:
    Select Policy: Disabled
    Policies:
      - Type: Percent  Value: 100  Period: 15 seconds
AlluxioRuntime pods:   1 current / 1 desired
Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from Dataset metric capacity_used_rate
  ScalingLimited  False   DesiredWithinRange   the desired count is within the acceptable range
Events:           <none>

12. Create a data preheating task

$ cat<<EOF > dataload.yaml
apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
  name: spark
spec:
  dataset:
    name: spark
    namespace: default
EOF
$ kubectl create -f dataload.yaml
$ kubectl get dataload
NAME    DATASET   PHASE       AGE   DURATION
spark   spark     Executing   15s   Unfinished

13. At this time, we can see that the amount of cached data is close to the cache capability provided by Fluid (1GiB.) In addition, the condition for auto scaling is triggered.


$  kubectl  get dataset
NAME    UFS TOTAL SIZE   CACHED       CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
spark   2.71GiB          1020.92MiB   1.00GiB          36.8%               Bound   5m15s

According to the HPA monitoring, the scale-up of the AlluxioRuntime has started, and the scale-up step length is 2.

$ kubectl get hpa
NAME    REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
spark   AlluxioRuntime/spark   100/90    1         4         2          4m20s
$ kubectl describe hpa
Name:                                                    spark
Namespace:                                               default
Labels:                                                  <none>
Annotations:                                             <none>
CreationTimestamp:                                       Wed, 07 Apr 2021 17:56:31 +0800
Reference:                                               AlluxioRuntime/spark
Metrics:                                                 ( current / target )
  "capacity_used_rate" on Dataset/spark (target value):  100 / 90
Min replicas:                                            1
Max replicas:                                            4
Behavior:
  Scale Up:
    Stabilization Window: 0 seconds
    Select Policy: Max
    Policies:
      - Type: Pods  Value: 2  Period: 600 seconds
  Scale Down:
    Select Policy: Disabled
    Policies:
      - Type: Percent  Value: 100  Period: 15 seconds
AlluxioRuntime pods:   2 current / 3 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    SucceededRescale    the HPA controller was able to update the target scale to 3
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from Dataset metric capacity_used_rate
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type     Reason                        Age                    From                       Message
  ----     ------                        ----                   ----                       -------
  Normal   SuccessfulRescale             21s                    horizontal-pod-autoscaler  New size: 2; reason: Dataset metric capacity_used_rate above target
  Normal   SuccessfulRescale             6s                     horizontal-pod-autoscaler  New size: 3; reason: Dataset metric capacity_used_rate above target

14. After waiting for a period, the cache space of the dataset increases from 1GiB to 3GiB, and the data cache is nearly complete.

$ kubectl  get dataset
NAME    UFS TOTAL SIZE   CACHED    CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
spark   2.71GiB          2.59GiB   3.00GiB          95.6%               Bound   12m

Meanwhile, the status of HPA shows that the number of replicas of the dataset's corresponding runtime is 3, and the capacity_used_rate of cache space already used is 85%, which will not trigger cache scale-up.

$ kubectl get hpa
NAME    REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
spark   AlluxioRuntime/spark   85/90     1         4         3          11m

15. Clean up the environment

kubectl delete hpa spark
kubectl delete dataset spark

Conclusion

Fluid provides the capability to combine Prometheus, Kubernetes HPA, and custom metrics, triggering auto scaling based on the proportion of occupied cache space and enabling the on-demand use of cache. Therefore, users can be more flexible while using the distributed cache to improve data access acceleration. In the future, the timed scaling will be supported to provide greater certainty for scaling.

Code Library of Fluid: https://github.com/fluid-cloudnative/fluid.git

You are welcome to follow and contribute code.

Community

Fluid Helps Improve Data Elasticity with Customized Auto Scaling

Background

Prerequisites

Procedures

Conclusion

Read previous post:

Read next post:

Alibaba Cloud Native Community

You may also like

Comments

Alibaba Cloud Native Community

Related Products

Auto Scaling

Big Data Consulting for Data Technology Solution

Big Data Consulting Services for Retail Solution

ApsaraDB for HBase