Enable the auto recovery feature for FUSE mount targets - Container Service for Kubernetes

During the lifecycle of an application pod, the Filesystem in Userspace (FUSE) daemon may unexpectedly crash. As a result, the application pod can no longer use the FUSE file system to access data. This topic describes how to enable the auto recovery feature for the mount targets of a FUSE file system to restore access to application data without the need to restart the application pods.

Prerequisites

A Container Service for Kubernetes (ACK) Pro cluster or ACK Serverless Pro cluster that runs a non-ContainerOS operating system is created. The Kubernetes version of the cluster is 1.18 or later. For more information, see Create an ACK Pro cluster and Create an ACK Serverless cluster.
The cloud-native AI suite is installed and the ack-fluid component is deployed. The version of ack-fluid is 1.0.9 or later.
Important
- If you have installed open source Fluid, uninstall it before you deploy the ack-fluid component.
- If you have not installed the cloud-native AI suite, enable Fluid acceleration when you install the suite. For more information, see Deploy the cloud-native AI suite.
- If you have installed the cloud-native AI suite, log on to the ACK console and deploy ack-fluid from the Cloud-native AI Suite page.
Object Storage Service (OSS) is activated and a bucket is created. For more information, see Activate OSS and Create buckets.
A kubectl client is connected to the ACK Pro cluster. For more information, see Connect to a cluster by using kubectl.

Overview

Application pods that use Fluid datasets access data in a distributed cache system by using the FUSE file system. Each FUSE file system corresponds to a FUSE daemon, which processes the file access requests sent to the FUSE file system.

During the lifecycle of an application pod, the FUSE daemon may unexpectedly crash. For example, the memory usage exceeds the upper limit and the daemon is killed. As a result, the "Transport Endpoint is Not Connected" error appears when the application pod accesses files in the FUSE file system. To resolve this issue, you must manually restart or rebuild the application pod to restore access to the FUSE file system.

Fluid provides the auto recovery feature for the mount targets of a FUSE file system. By periodically querying the status of the FUSE file system mounted to each application pod on a node, Fluid allows you to restore data access for application pods without the need to restart or rebuild the application pods.

Usage notes

The auto recovery process has a delay and does not support seamless auto recovery for business applications. Business applications must tolerate data access failures and continue to retry until data access is restored.
You can enable auto recovery only for read-only datasets. If the cluster contains a dataset that can be read and written, make sure that this feature is disabled in case data is unexpectedly written to the dataset.
This feature does not allow you to mount application pods to the persistent volume claims (PVCs) of datasets in subPath mode.
The auto recovery feature for FUSE must be enabled after the FUSE daemon is automatically restarted. The FUSE daemon runs in a container. When the FUSE daemon frequently crashes, the interval at which Kubernetes restarts the container exponentially increases. This increases the duration of auto recovery for FUSE.

Enable auto recovery for FUSE mount targets in an ACK cluster

Step 1: Enable auto recovery for FUSE mount targets

Run the following command to enable auto recovery for FUSE mount targets:

kubectl get ds -n fluid-system csi-nodeplugin-fluid -oyaml | sed 's/FuseRecovery=false/FuseRecovery=true/g' | kubectl apply -f -

Expected output:

daemonset.apps/csi-nodeplugin-fluid configured

Run the following command to check whether auto recovery is enabled for FUSE mount targets:

kubectl get ds -n fluid-system csi-nodeplugin-fluid -oyaml | grep '\- \-\-feature-gates='

If the following output is returned, auto recovery is enabled for FUSE mount targets:

- --feature-gates=FuseRecovery=true

Step 2: Create a Fluid dataset

In this example, JindoFS is deployed to accelerate access to OSS.

Create a file named secret.yaml and copy the following content to the file:

apiVersion: v1
kind: Secret
metadata:
  name: mysecret
stringData:
  fs.oss.accessKeyId: <YOUR_ACCESS_KEY_ID>
  fs.oss.accessKeySecret: <YOUR_ACCESS_KEY_SECRET>

fs.oss.accessKeyId and fs.oss.accessKeySecret specify the AccessKey ID and AccessKey secret used to access OSS.

Run the following command to create a Secret:

kubectl create -f secret.yaml

Create a file named dataset.yaml and copy the following content to the file:

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo-dataset
spec:
  mounts:
    - mountPoint: oss://<oss_bucket>/<bucket_dir>
      options:
        fs.oss.endpoint: <oss_endpoint>
      name: mybucket
      path: "/"
      encryptOptions:
        - name: fs.oss.accessKeyId
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeyId
        - name: fs.oss.accessKeySecret
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: demo-dataset
spec:
  replicas: 2
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        volumeType: emptyDir
        quota: 2Gi
        high: "0.99"
        low: "0.95"

The following table describes the parameters.

Parameter	Description
mountPoint	oss://<oss_bucket>/<bucket_dir> specifies the path to the UFS that is mounted. The endpoint is not required in the path.
fs.oss.endpoint	The public or private endpoint of the OSS bucket. For more information, see Regions and endpoints.
replicas	The number of workers in the JindoFS cluster.
mediumtype	The type of cache. When you create a JindoRuntime template, JindoFS supports only one of the following cache types: HDD, SDD, and MEM.
path	The storage path. You can specify only one path. If you set mediumtype to MEM, you must specify a path of the on-premises storage to store data such as logs.
quota	The maximum size of cached data. Unit: GB.
high	The upper limit of the storage capacity.
low	The lower limit of the storage capacity.

Run the following command to create a Dataset object and a JindoRuntime object:

kubectl create -f dataset.yaml

Step 3: Create an application pod and mount the Fluid dataset

In this example, a Fluid dataset is mounted to an NGINX pod and the pod is used to access the data in the dataset.

Create a file named app.yaml and copy the following content to the file:

apiVersion: v1
kind: Pod
metadata:
  name: demo-app
  labels:
    fuse.serverful.fluid.io/inject: "true"
spec:
  containers:
    - name: demo
      image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
      volumeMounts:
        - mountPath: /data
          name: data-vol
  volumes:
    - name: data-vol
      persistentVolumeClaim:
        claimName: demo-dataset   # The value of this parameter must be the same as the name of the Dataset.

The fuse.serverful.fluid.io/inject=true label is used to enable auto recovery for the FUSE mount target of the pod.

Run the following command to create an application pod:

kubectl create -f app.yaml

Run the following command to view the status of the pod:

kubectl get pod demo-app

If the STATUS field of the pod is Running, the pod is started.

NAME       READY   STATUS    RESTARTS   AGE
demo-app   1/1     Running   0          16s

Step 4: Verify the auto recovery feature for the FUSE mount target

Run the following command to log on to the pod and run a script that periodically accesses file metadata. The script lists the files in the mounted Fluid dataset every second.

kubectl exec -it demo-app -- bash -c 'while true; do ls -l /data; sleep 1; done'

Keep the preceding script running in the background and run the following command to simulate a crash in the FUSE component:

# Obtain the node where demo-pod resides.
demo_pod_node_name=$(kubectl get pod demo-app -ojsonpath='{.spec.nodeName}')
# Obtain the name of the FUSE pod on the same node as demo-pod.
fuse_pod_name=$(kubectl get pod --field-selector spec.nodeName=$demo_pod_node_name --selector role=jindofs-fuse,release=demo-dataset -oname)
# Simulate a crash in the FUSE pod.
kubectl exec -it $fuse_pod_name -- bash -c 'kill 1'

View the output of the script that is run in demo-app. If the following output is returned, the FUSE mount point is recovered.

...
total 172
-rwxrwxr-x 1 root root          18 Jul  1 15:17 myfile
-rwxrwxr-x 1 root root         154 Jul  1 17:06 myfile.txt
total 172
-rwxrwxr-x 1 root root          18 Jul  1 15:17 myfile
-rwxrwxr-x 1 root root         154 Jul  1 17:06 myfile.txt
ls: cannot access '/data/': Transport endpoint is not connected
ls: cannot access '/data/': Transport endpoint is not connected
ls: cannot access '/data/': Transport endpoint is not connected
ls: cannot access '/data/': Transport endpoint is not connected
ls: cannot access '/data/': Transport endpoint is not connected
ls: cannot access '/data/': Transport endpoint is not connected
ls: cannot access '/data/': Transport endpoint is not connected
ls: cannot access '/data/': Transport endpoint is not connected
total 172
-rwxrwxr-x 1 root root          18 Jul  1 15:17 myfile
-rwxrwxr-x 1 root root         154 Jul  1 17:06 myfile.txt
total 172
-rwxrwxr-x 1 root root          18 Jul  1 15:17 myfile
-rwxrwxr-x 1 root root         154 Jul  1 17:06 myfile.txt
...

Enable auto recovery for FUSE mount targets in a serverless environment

Step 1: Create a Fluid dataset

In this example, JindoFS is deployed to accelerate access to OSS.

Create a file named secret.yaml and copy the following content to the file:

apiVersion: v1
kind: Secret
metadata:
  name: mysecret
stringData:
  fs.oss.accessKeyId: <YOUR_ACCESS_KEY_ID>
  fs.oss.accessKeySecret: <YOUR_ACCESS_KEY_SECRET>

fs.oss.accessKeyId and fs.oss.accessKeySecret specify the AccessKey ID and AccessKey Secret used to access OSS.

Run the following command to create a Secret:

kubectl create -f secret.yaml

Create a file named dataset.yaml and copy the following content to the file:

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo-dataset
spec:
  mounts:
    - mountPoint: oss://<oss_bucket>/<bucket_dir>
      options:
        fs.oss.endpoint: <oss_endpoint>
      name: mybucket
      path: "/"
      encryptOptions:
        - name: fs.oss.accessKeyId
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeyId
        - name: fs.oss.accessKeySecret
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: demo-dataset
spec:
  replicas: 2
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        volumeType: emptyDir
        quota: 2Gi
        high: "0.99"
        low: "0.95"

The following table describes the parameters.

Parameter	Description
mountPoint	oss://<oss_bucket>/<bucket_dir> specifies the path to the mounted UFS. The endpoint is not required in the path.
fs.oss.endpoint	The public or private endpoint of the OSS bucket. For more information, see Regions and endpoints.
replicas	The number of workers in the JindoFS cluster.
mediumtype	The type of cache. When you create a JindoRuntime template, JindoFS supports only one of the following cache types: HDD, SDD, and MEM.
path	The storage path. You can specify only one path. If you set mediumtype to MEM, you must specify a path of the on-premises storage to store data such as logs.
quota	The maximum size of cached data. Unit: GB.
high	The upper limit of the storage capacity.
low	The lower limit of the storage capacity.

Run the following commands to create a Dataset object and a JindoRuntime object:

kubectl create -f dataset.yaml

Step 2: Create an application pod and mount the Fluid dataset

In this example, a Fluid dataset is mounted to an NGINX pod and the pod is used to access the data in the dataset.

Create a file named app.yaml and copy the following content to the file:

apiVersion: v1
kind: Pod
metadata:
  name: demo-app
  labels:
    alibabacloud.com/fluid-sidecar-target: eci
  annotations:
    # Disable the virtual node-based pod scheduling policy. 
    alibabacloud.com/burst-resource: eci_only
    # Enable auto recovery for FUSE
    alibabacloud.com/fuse-recover-policy: auto
spec:
  containers:
    - name: demo
      image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
      volumeMounts:
        - mountPath: /data
          name: data-vol
  volumes:
    - name: data-vol
      persistentVolumeClaim:
        claimName: demo-dataset  # The value of this parameter must be the same as the name of the Dataset.

The alibabacloud.com/fuse-recover-policy=auto annotation is used to enable auto recovery for the FUSE mount target of the pod. This annotation takes effect only on application pods that run in the serverless environment.

Run the following command to create a pod:

kubectl create -f app.yaml

Run the following command to view the status of the pod:

kubectl get pod demo-app

If the STATUS field of the pod is Running, the pod is started.

NAME       READY   STATUS    RESTARTS   AGE
demo-app   2/2     Running   0          110s

Step 3: Verify the auto recovery feature for the FUSE mount target

Run the following command to log on to the pod and run a script that periodically accesses file metadata. The script lists the files in the mounted Fluid dataset every second.

kubectl exec -it demo-app -c demo -- bash -c 'while true; do ls -l /data; sleep 1; done'

Keep the preceding script running in the background and run the following command to simulate a crash in the FUSE component:

# Simulate a crash in the FUSE pod.
kubectl exec -it demo-app -c fluid-fuse-0 -- bash -c 'kill 1'

View the output of the script that is run in demo-app. If the following output is returned, the FUSE mount point is recovered.

total 172
-rwxrwxr-x 1 root root          18 Jul  1 15:17 myfile
-rwxrwxr-x 1 root root         154 Jul  1 17:06 myfile.txt
total 172
-rwxrwxr-x 1 root root          18 Jul  1 15:17 myfile
-rwxrwxr-x 1 root root         154 Jul  1 17:06 myfile.txt
ls: cannot access '/data/demo2': Transport endpoint is not connected
ls: cannot access '/data/demo2': Transport endpoint is not connected
ls: cannot access '/data/demo2': Transport endpoint is not connected
ls: cannot access '/data/demo2': Transport endpoint is not connected
total 172
-rwxrwxr-x 1 root root          18 Jul  1 15:17 myfile
-rwxrwxr-x 1 root root         154 Jul  1 17:06 myfile.txt
total 172
-rwxrwxr-x 1 root root          18 Jul  1 15:17 myfile
-rwxrwxr-x 1 root root         154 Jul  1 17:06 myfile.txt