All Products
Search
Document Center

Container Service for Kubernetes:Use JindoRuntimes to persist storage for the JindoFS master

Last Updated:Aug 30, 2024

To avoid metadata loss caused by JindoFS master restarts, Fluid allows you to use JindoRuntime to persist the metadata maintained by the JindoFS master. This improves the availability of JindoFS clusters in distributed caching scenarios.

Feature description

JindoFS is an execution engine for dataset management and caching developed by the Alibaba Cloud E-MapReduce (EMR) team based on C++. You can use JindoFS to cache data from various sources, including Object Storage Service (OSS), OSS-Hadoop Distributed File System (HDFS), and persistent volume claims (PVCs). For more information, see JindoData overview.

JindoFS uses a master-worker architecture where the master maintains the metadata and mount points of cached data and the workers manage cached data. You can containerize the JindoFS master and workers in Kubernetes clusters. When the containers in which the JindoFS master runs are restarted or rescheduled, the metadata and mount points may be lost. As a result, the JindoFS cluster may become unavailable. To enhance the availability of JindoFS clusters, you can use Fluid JindoRuntimes to persist the metadata of the JindoFS master to Kubernetes persistent volumes (PVs).

Prerequisites

Step 1: Prepare a disk volume

  1. Create a file named pvc.yaml. The file is used to create a PVC that you can use to mount a disk volume.

    Note

    For more information about the parameters in the PVC, see Use a dynamically provisioned disk volume by using kubectl.

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: demo-jindo-master-meta
    spec:
      accessModes:
        - ReadWriteOnce
      storageClassName: alicloud-disk-topology-alltype
      resources:
        requests:
          storage: 30Gi
  2. Run the following command to create the PVC:

    kubectl create -f pvc.yaml

    Expected output:

    persistentvolumeclaim/demo-jindo-master-meta created

Step 2: Create a Dataset and a JindoRuntime

  1. Create a file named secret.yaml. The file is used to save the AccessKey ID and AccessKey secret used by the RAM user to access the OSS bucket.

    apiVersion: v1
    kind: Secret
    metadata:
      name: access-key
    stringData:
      fs.oss.accessKeyId: ****** # Specify the AccessKey ID. 
      fs.oss.accessKeySecret: ****** # Specify the AccessKey secret.

  2. Run the following command to create the Secret:

    kubectl create -f secret.yaml

    Expected output:

    secret/access-key created
  3. Create a file named dataset.yaml. The file is used to configure a Dataset and a JindoRuntime.

    apiVersion: data.fluid.io/v1alpha1
    kind: Dataset
    metadata:
      name: demo
    spec:
      mounts: 
        - mountPoint: oss://<OSS_BUCKET>/<BUCKET_DIR>
          name: demo
          path: /
          options:
            fs.oss.endpoint: <OSS_BUCKET_ENDPOINT>
          encryptOptions:
            - name: fs.oss.accessKeyId
              valueFrom:
                secretKeyRef:
                  name: access-key
                  key: fs.oss.accessKeyId
            - name: fs.oss.accessKeySecret
              valueFrom:
                secretKeyRef:
                  name: access-key
                  key: fs.oss.accessKeySecret
    ---
    apiVersion: data.fluid.io/v1alpha1
    kind: JindoRuntime
    metadata:
      name: demo
    spec:
      replicas: 2
      volumes:
        - name: meta-vol
          persistentVolumeClaim:
            claimName: demo-jindo-master-meta
      master:
        volumeMounts:
          - name: meta-vol
            mountPath: /root/jindofsx-meta
        properties:
          namespace.meta-dir: "/root/jindofsx-meta"
      tieredstore:
        levels:
          - mediumtype: MEM
            path: /dev/shm
            volumeType: emptyDir
            quota: 12Gi
            high: "0.99"
            low: "0.99"

    The following table describes the parameters in the preceding code block.

    Parameter

    Description

    JindoRuntime

    volumes

    This parameter specifies the volumes that are mounted to the components of the JindoRuntime. Specify the PVC that you created in Step 1: Prepare a disk volume.

    master.volumeMounts

    This parameter specifies the name of the volume that is mounted to the JindoRuntime master and the mount path of the volume.

    master.properties

    This parameter specifies the details of the JindoRuntime master. To persist the metadata maintained by the JindoRuntime master, you must specify namespace.meta-dir: <path>. Replace <path> with the mount path that you specified in the master.volumeMounts parameter.

  4. Run the following command to create a Dataset and a JindoRuntime:

    kubectl create -f dataset.yaml

    Expected output:

    dataset.data.fluid.io/demo created
    jindoruntime.data.fluid.io/demo created
  5. Run the following command to check whether the Dataset is created:

    kubectl get dataset

    Expected output:

    NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE
    demo 531.89MiB      0.00B  24.00GiB       0.0%              Bound 5m35s

    If Bound is displayed in the PHASE column, the Dataset and the JindoRuntime are created. After a Dataset enters the Bound state, Fluid automatically creates a PVC that is named after the Dataset. You can mount the PVC to an application pod to allow the pod to access data in the data source specified in the mount point (Dataset.spec.mountPoint) of the Dataset.

Step 3: Check whether persistent storage is enabled for the JindoFS master

In this step, the pod in which the JindoFS master runs is rescheduled to check whether persistent storage is enabled.

  1. Create a file named pod.yaml and specify the PVC in the code block.

    apiVersion: v1
    kind: Pod
    metadata:
      name: nginx
    spec:
      containers:
        - name: nginx
          image: registry.openanolis.cn/openanolis/nginx:1.14.1-8.6
          volumeMounts:
            - mountPath: /data
              name: data-vol
      volumes:
        - name: data-vol
          persistentVolumeClaim:
            claimName: demo
  2. Run the following command to deploy an NGINX application in your cluster:

    kubectl create -f pod.yaml

    Expected output:

    pod/nginx created
  3. Run the following command to access data from the application pod:

    kubectl exec -it nginx -- ls /data

    The data in the OSS bucket specified in the mount point (Dataset.spec.mountPoint) of the Dataset is expected to be returned in the output.

  4. Run the following command to query the node on which the JindoFS master is deployed:

    master_node=$(kubectl get pod -o wide | awk '/demo-jindofs-master-0/ {print $7}')
  5. Run the following command to add a taint to the node returned in Step 4 to prevent new pods from being scheduled to the node.

    kubectl taint node $master_node test-jindofs-master=reschedule:NoSchedule

    Expected output:

    node/cn-beijing.192.168.xx.xx tainted
  6. Run the following command to delete the pod in which JindoFS master runs and wait for the system to recreate the pod:

    kubectl delete pod demo-jindofs-master-0

    Expected output:

    pod "demo-jindofs-master-0" deleted 

    The demo-jindofs-master-0 pod is recreated and scheduled to another node in the cluster. A disk volume is mounted to the pod. Therefore, the pod is recreated and restored to the status before deletion.

    Note

    To meet this goal, the recreated pod must be scheduled to a new node in the same zone as the original node on which the pod is deployed. This is because disks cannot be mounted across zones Therefore, you must ensure that the cluster contains at least two nodes in the zone where the pod is originally deployed.

  7. Run the following command to delete the application pod and wait for the system to recreate the pod:

    kubectl delete -f pod.yaml && kubectl create -f pod.yaml

    Expected output:

    pod "nginx" deleted
    pod/nginx created
  8. Run the following command to access data from the recreated application pod:

    kubectl exec -it nginx -- ls /data

    The data in the OSS bucket specified in the mount point (Dataset.spec.mountPoint) of the Dataset is expected to be returned in the output.

Step 4: Clear the environment

  1. Run the following command to delete the application pod:

    kubectl delete -f pod.yaml

    Expected output:

    pod "nginx" deleted
  2. Run the following command to remove the taint from the node:

    kubectl taint node $master_node test-jindofs-master-

    Expected output:

    node/cn-beijing.192.168.xx.xx untainted
  3. (Optional) Run the following commands in sequence to delete the resources related to the disk volume.

    Important
    • After you create a disk volume, you are charged for the disk created for the volume. If you no longer need to use data acceleration, clear the environment. For more information about the fees incurred by disk volumes, see Disk volume overview.

    • Before you clear the resources, make sure that no application is using the Dataset and no I/O operation is performed on the Dataset.

    kubectl delete -f dataset.yaml
    kubectl delete -f pvc.yaml