All Products
Search
Document Center

Container Service for Kubernetes:Accelerate Jobs

Last Updated:Aug 28, 2024

You can use Fluid to accelerate access to data stored in ACK Serverless clusters. You can deploy all Fluid components, including the Fluid controllers and cache runtime engine, and your application in an ACK Serverless cluster. This topic describes how to accelerate Jobs in ACK Serverless clusters.

Prerequisites

Limits

This feature is mutually exclusive with the virtual node-based pod scheduling feature of ACK Serverless clusters. For more information about the virtual node-based pod scheduling feature, see Enable the virtual node-based pod scheduling policy for an ACK cluster.

Deploy the control plane components of Fluid

Important

If you have installed open source Fluid, you must uninstall Fluid before you can install the ack-fluid component.

  1. Deploy the control plane components of Fluid.

    1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

    2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Applications > Helm.

    3. On the Helm page, click Deploy.

    4. On the Basic Information wizard page, configure the parameters and click Next.

      The following table describes some parameters that are displayed.

      Parameter

      Description

      Source

      Select Marketplace.

      Chart

      Find and click ack-fluid.

      Note

      The default release name for the ack-fluid chart is ack-fluid. The default namespace for the ack-fluid chart is fluid-system. Click Next. If the actual release name and namespace for the ack-fluid chart are different from the default release name and default namespace, the Confirm message appears. Click Yes to use the default release name and default namespace.

    5. On the Parameters wizard page, click OK.

  2. Run the following command to check whether Fluid is deployed:

    kubectl get pod -n fluid-system

    Expected output:

    NAME                                  READY   STATUS    RESTARTS   AGE
    dataset-controller-d99998f79-dgkmh    1/1     Running   0          2m48s
    fluid-webhook-55c6d9d497-dmrzb        1/1     Running   0          2m49s

    The output indicates that Fluid is deployed. The following content describes the control plane components of Fluid:

    • Dataset Controller: manages the lifecycle of the Dataset objects that are referenced by Fluid. Dataset objects are custom resource (CR) objects.

    • Fluid Webhook: performs sidecar injection on pods that need to access data. This makes data access transparent to users in serverless scenarios.

    Note

    In addition to the preceding components, the control plane of Fluid also includes controllers that are used to manage the lifecycle of cache runtimes, such as the JindoFS runtime, JuiceFS runtime, and Alluxio runtime. The controllers corresponding to a cache runtime are deployed only after the cache runtime is used.

Examples of accelerating data access in an ACK Serverless cluster

Step 1: Upload the test dataset to the OSS bucket

  1. Create a test dataset of 2 GB in size. In this example, the test dataset is used.

  2. Upload the test dataset to the OSS bucket that you created.

    You can use the ossutil tool provided by OSS to upload data. For more information, see Install ossutil.

Step 2: Create a Dataset object and a Runtime object

After you upload the test dataset to OSS, you can use Fluid to claim the dataset. Create a Dataset object (CR object) and a Runtime object (CR object).

  • The Dataset object is used to specify the URL of the test dataset that is uploaded to OSS.

  • The Runtime object is used to define and configure the cache system that is used.

  1. Run the following command to create a Secret that stores the credentials used to access the OSS bucket.

    kubectl create secret generic oss-access-key \
      --from-literal=fs.oss.accessKeyId=<access_key_id> \
      --from-literal=fs.oss.accessKeySecret=<access_key_secret>
  2. Create a file named dataset.yaml and copy the following content to the file. The file is used to create a Dataset object and a Runtime object.

    In this topic, the JindoRuntime is used to interface with JindoFS.

    apiVersion: data.fluid.io/v1alpha1
    kind: Dataset
    metadata:
      name: demo-dataset
    spec:
      mounts:
        - mountPoint: oss://<bucket_name>/<bucket_path>
          name: demo
          path: /
          options:
            fs.oss.endpoint: oss-<region>.aliyuncs.com # The endpoint of the OSS bucket. 
          encryptOptions:
            - name: fs.oss.accessKeyId
              valueFrom:
                secretKeyRef:
                  name: oss-access-key
                  key: fs.oss.accessKeyId
            - name: fs.oss.accessKeySecret
              valueFrom:
                secretKeyRef:
                  name: oss-access-key
                  key: fs.oss.accessKeySecret
    ---
    apiVersion: data.fluid.io/v1alpha1
    kind: JindoRuntime
    metadata:
      name: demo-dataset
    spec:
      # The number of worker pods to be created in the JindoFS cluster. 
      replicas: 2
      worker:
        podMetadata:
          annotations:
            # Disable the virtual node-based pod scheduling policy. 
            alibabacloud.com/burst-resource: eci_only
            # The type of instance that is used to run the worker pods. 
            k8s.aliyun.com/eci-use-specs: <eci_instance_spec>
            # Use an image cache to accelerate pod creation. 
            k8s.aliyun.com/eci-image-cache: "true"
      tieredstore:
        levels:
          # Specify 10 GiB of memory as the cache for each worker pod. 
          - mediumtype: MEM
            volumeType: emptyDir
            path: /dev/shm
            quota: 10Gi
            high: "0.99"
            low: "0.99"

    The following table describes some parameters that are displayed.

    Parameter

    Description

    mountPoint

    The path to which the UFS is mounted. The format of the path is oss://<oss_bucket>/<bucket_dir>.

    Do not include endpoint information in the path. If you use only one mount point, you can set path to /.

    options

    Information about the endpoint of the OSS bucket. You can specify a public or internal endpoint.

    fs.oss.endpoint

    The public or internal endpoint of the OSS bucket.

    You can specify the internal endpoint of the bucket to enhance data security. However, if you specify the internal endpoint, make sure that your cluster is deployed in the region where OSS is activated. For example, if your OSS bucket is created in the China (Hangzhou) region, the public endpoint of the bucket is oss-cn-hangzhou.aliyuncs.com and the internal endpoint is oss-cn-hangzhou-internal.aliyuncs.com.

    fs.oss.accessKeyId

    The AccessKey ID that is used to access the bucket.

    fs.oss.accessKeySecret

    The AccessKey secret that is used to access the bucket.

    replicas

    The number of worker pods that are created by the JindoRuntime. This parameter determines the maximum cache size that can be provided by the distributed cache runtime.

    worker.podMetadata.annotations

    You can specify an instance type and an image cache.

    tieredstore.levels

    You can use the quota field to specify the maximum size of the cache used by each worker pod.

    tieredstore.levels.mediumtype

    The cache type. Supported cache types are HDD, SSD, and MEM.

    For more information about the recommended configurations of the mediumtype, see Policy 2: Select proper cache media.

    tieredstore.levels.volumeType

    The volume type of the cache medium. Valid values: emptyDir and hostPath. Default value: hostPath.

    • If you use memory or local system disks as the cache medium, we recommend that you use the emptyDir type to avoid residual cache data on the node and ensure node availability.

    • If you use local data disks as the cache medium, you can use the hostPath type and configure the path to specify the mount path of the data disk on the host.

    For more information about the recommended configurations of the volumeType, see Policy 2: Select proper cache media.

    tieredstore.levels.path

    The path of the cache. You can specify only one path.

    tieredstore.levels.quota

    The maximum cache size. For example, a value of 100 Gi indicates that the maximum cache size is 100 GiB.

    tieredstore.levels.high

    The upper limit of the storage.

    tieredstore.levels.low

    The lower limit of the storage.

  3. Run the following commands to create a Dataset object and a JindoRuntime object:

    kubectl create -f dataset.yaml
  4. Run the following command to check whether the Dataset object is created.

    It requires 1 to 2 minutes to create the Dataset object and the JindoRuntime object. After the objects are created, you can query information about the cache system and cached data.

    kubectl get dataset demo-dataset

    Expected output:

    NAME           UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
    demo-dataset   1.16GiB          0.00B    20.00GiB         0.0%                Bound   2m58s

    The output shows information about the Dataset object that you created in Fluid. The following table describes the parameters in the output.

    Parameter

    Description

    UFS TOTAL SIZE

    The size of the dataset that is uploaded to OSS.

    CACHED

    The size of the cached data.

    CACHE CAPACITY

    The total cache size.

    CACHED PERCENTAGE

    The percentage of the cached data in the dataset.

    PHASE

    The status of the Dataset object. If the value is Bound, the Dataset object is created.

(Optional) Step 3: Preheat data

Prefetching can efficiently accelerate first-time data access. We recommend that you use this feature if this is the first time you retrieve data.

  1. Create a file named dataload.yaml based on the following content:

    apiVersion: data.fluid.io/v1alpha1
    kind: DataLoad
    metadata:
      name: data-warmup
    spec:
      dataset:
        name: demo-dataset
        namespace: default
      loadMetadata: true
  2. Run the following command to create a DataLoad object:

    kubectl create -f dataload.yaml

    Expected output:

    NAME          DATASET        PHASE      AGE   DURATION
    data-warmup   demo-dataset   Complete   99s   58s

    The output shows that the duration of data prefetching is 58 seconds.

Step 4: Create a Job to test data access acceleration

You can create applications to test whether data access is accelerated by JindoFS, or submit machine learning Jobs to use relevant features. This section describes how to use a Job to access the data stored in OSS.

  1. Create a file named job.yaml and copy the following content to the file:

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: demo-app
    spec:
      template:
        metadata:
          labels:
            alibabacloud.com/fluid-sidecar-target: eci
          annotations:
            # Disable the virtual node-based pod scheduling policy. 
            alibabacloud.com/burst-resource: eci_only
            # Select an instance type for pods. 
            k8s.aliyun.com/eci-use-specs: ecs.g7.4xlarge
        spec:
          containers:
            - name: demo
              image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
              args:
                - -c
                - du -sh /data && time cp -r /data/ /tmp
              command:
                - /bin/bash
              volumeMounts:
                - mountPath: /data
                  name: demo
          restartPolicy: Never
          volumes:
            - name: demo
              persistentVolumeClaim:
                claimName: demo-dataset
      backoffLimit: 4
  2. Run the following command to create a Job:

    kubectl create -f job.yaml
  3. Run the following command to query the boot log of the pod created by the Job:

    kubectl logs demo-app-jwktf -c demo

    Expected output:

    1.2G    /data
    
    real    0m0.992s
    user    0m0.004s
    sys     0m0.674s

    The real field in the output shows that it took 0.992 seconds (0m0.992s) to replicate the file.

Step 5: Clear data

After you test data access acceleration, clear the relevant data at the earliest opportunity.

  1. Run the following command to delete the pods of the Job:

    kubectl delete job demo-app
  2. Run the following command to delete the Dataset object and the components corresponding to the caching runtime.

    kubectl delete dataset demo-dataset
    Important

    It requires about 1 minute to delete the components. Before you perform the following step, make sure that the components are deleted.

  3. Run the following command to delete the control plane components of Fluid:

    kubectl get deployments.apps -n fluid-system | awk 'NR>1 {print $1}' | xargs kubectl scale deployments -n fluid-system --replicas=0

    To enable data access acceleration again, you must run the following command to create the control plane components of Fluid before you create a Dataset object and a Runtime object:

    kubectl scale -n fluid-system deployment dataset-controller --replicas=1
    kubectl scale -n fluid-system deployment fluid-webhook --replicas=1