Best practice for OSS read/write splitting

Object Storage Service (OSS) volumes are Filesystem in Userspace (FUSE) file systems mounted by using OSSFS. OSS volumes are ideal for read operations. OSS is a shared storage service which can be accessed in ReadOnlyMany and ReadWriteMany modes. OSSFS is applicable to concurrent read scenarios. We recommend that you set the access modes of persistent volume claims (PVCs) and persistent volumes (PVs) to ReadOnlyMany in concurrent read scenarios. This topic describes how to use the OSS SDK and ossutil to configure read/write splitting in read-heavy scenarios.

Prerequisites

An ACK managed cluster is created.
An OSS bucket is created.
Note
Select the internal endpoint if the OSS bucket and the Elastic Compute Service (ECS) instance are deployed in the same region.
A kubectl client is connected to the cluster.

Use scenarios

OSS is commonly used in read-only and read/write scenarios. In read-heavy scenarios, we recommend that you configure read/write splitting for OSS, configure cache parameters to accelerate read operations, and use the OSS SDK to write data.

Read-only scenarios

In big data inference, analysis, and query scenarios, we recommend that set the access mode of OSS volumes to ReadOnlyMany to ensure that data is not accidentally deleted or modified. For more information, see Mount a statically provisioned OSS volume.

You can also configure cache parameters to accelerate read operations.

Parameter	Description

Parameter	Description
kernel_cache	Use the kernel cache to accelerate read operations. This feature is suitable for scenarios where you do not need to access the up-to-date data in real time. When OSSFS needs to read a file multiple times and the query hits the cache, idle memory in the kernel cache is used to cache the file in order to accelerate data retrieval.
parallel_count	Specifies the maximum number of parts that can be concurrently downloaded or uploaded during multipart downloading or uploading. The default is 20.
max_multireq	Specifies the maximum number of queries that can concurrently retrieve file metadata. The value of this parameter must be greater than or equal to that of the parallel_count parameter. The default is 20.
max_stat_cache_size	Specifies the maximum number of files whose metadata is stored in metadata caches. The default is 1000. To disable metadata caches, set this parameter to 0. In scenarios where you do not need to access the up-to-date data in real time, when the current directory contains large numbers of files, you can increase the number of caches to accelerate LIST operations.

By default, ossfs has the 640 permission on files and directories that are uploaded by using the OSS console, OSS SDK, and ossutil. You can configure the -o gid=xxx -o uid=xxx or -o mask=022 parameter to ensure that ossfs has read permissions on the directories and subdirectories mounted by using the OSS console, OSS SDK, and ossutil. For more information, see How do I manage the permissions related to OSS volume mounting?. For more information about ossfs parameters, see ossfs/README-CN.md.

Read/write scenarios

In read/write scenarios, you must set the access mode of OSS volumes to ReadWriteMany. When you use ossfs, take note of the following items:

OSSFS cannot guarantee the consistency of data written by concurrent write operations.
When the OSS volume is mounted to a pod, if you log on to the pod or the host of the pod and delete or modify a file in the mounted path, the source file in the OSS bucket is also deleted or modified. To avoid accidentally deleting important data, you can enable version control for the OSS bucket. For more information, see Overview.

In some read-heavy scenarios, read and write requests are processed separately, such as big data model training. In these scenarios, we recommend that you split reads and writes for OSS. To do this, set the access mode of OSS volumes to ReadOnlyMany, configure cache parameters to accelerate read operations, and use the OSS SDK to write data. For more information, see Example.

Example

A hand-drawn image recognition training application is used as an example to describe how to configure OSS read/write splitting. A simple deep learning model is trained in this example. The training sets are retrieved from the /data-dir directory of a read-only OSS volume and the OSS SDK is used to write checkpoints to the /log-dir directory.

Use ossfs to configure read/write splitting

Deploy a hand-drawn image recognition training application based on the following template. The application is written in Python and a statically provisioned OSS volume is mounted to the application. For more information about how to configure OSS volumes, see Mount a statically provisioned OSS volume.

In the following example, the application mounts the /tf-train subdirectory of the OSS bucket to the /mnt directory of the pod. The MNIST hand-drawn image training sets are stored in the /tf-train/train/data directory. The following figure shows the directory.

View the YAML content of the hand-drawn image recognition training application

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: oss-secret
  namespace: default
stringData:
  akId: "<your-accesskey-id>"
  akSecret: "<your-accesskey-secret>"
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: tf-train-pv
  labels:
    alicloud-pvname: tf-train-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: ossplugin.csi.alibabacloud.com
    volumeHandle: tf-train-pv
    nodePublishSecretRef:
      name: oss-secret
      namespace: default
    volumeAttributes:
      bucket: "<a-bucket-name>"
      url: "oss-cn-beijing.aliyuncs.com"
      otherOpts: "-o max_stat_cache_size=0 -o allow_other"
      path: "/tf-train"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: tf-train-pvc
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  selector:
    matchLabels:
      alicloud-pvname: tf-train-pv
---
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: tfjob
  name: tf-mnist
  namespace: default
spec:
  containers:
  - command:
    - sh
    - -c
    - python /app/main.py
    env:
    - name: NVIDIA_VISIBLE_DEVICES
      value: void
    - name: gpus
      value: "0"
    - name: workers
      value: "1"
    - name: TEST_TMPDIR
      value: "/mnt"
    image: registry.cn-beijing.aliyuncs.com/tool-sys/tf-train-demo:rw
    imagePullPolicy: Always
    name: tensorflow
    ports:
    - containerPort: 20000
      name: tfjob-port
      protocol: TCP
    volumeMounts:
      - name: train
        mountPath: "/mnt"
    workingDir: /root
  priority: 0
  restartPolicy: Never
  securityContext: {}
  terminationGracePeriodSeconds: 30
  volumes:
  - name: train
    persistentVolumeClaim:
      claimName: tf-train-pvc
EOF

The trainning_logs directory is empty before the training starts. During the training, checkpoints are written to the /mnt/training_logs directory of the pod and uploaded to the /tf-train/trainning_logs directory of the OSS bucket by ossfs.

Verify that data can be read and written as normal.
1. Run the following command to query the status of the pod:
```
kubectl get pod tf-mnist
```
  Wait a few minutes until the status of the pod changes from Running to Completed. Expected output:
```
NAME       READY   STATUS    RESTARTS   AGE
tf-mnist   1/1     Completed   0          2m
```
2. Run the following command to print the log of the pod:
  Check the data loading time in the operational log of the pod. The loading time includes the amount of time required for downloading files from OSS and loading the files to TensorFlow.
```
kubectl logs pod tf-mnist | grep dataload
```
  Expected output:
```
dataload cost time:  1.54191803932
```
  The actual loading time varies based on the instance performance and network conditions.
3. Log on to the OSS console. You can find that files are uploaded to the /tf-train/trainning_logs directory of the OSS bucket. This indicates that data can be written to OSS and read from OSS as normal.

Use read/write splitting to improve the reading speeds of ossfs

A hand-drawn image recognition training application and the OSS SDK are used as an example to describe how to reconfigure an application to support read/write splitting.

Install the OSS SDK in a Container Service for Kubernetes (ACK) environment. Add the following content when you build the image: For more information, see Installation.
```
RUN pip install oss2
```

Refer to the official Python SDK demo and modify the source code.

The following code block shows the source code related to the base image when the preceding hand-drawn image recognition training application is used:

def train():
    ...
	saver = tf.train.Saver(max_to_keep=0)
    
    for i in range(FLAGS.max_steps):
        if i % 10 == 0:  # Record summaries and test-set accuracy
            summary, acc = sess.run([merged, accuracy], feed_dict=feed_dict(False))
            print('Accuracy at step %s: %s' % (i, acc))
            if i % 100 == 0:
                print('Save checkpoint at step %s: %s' % (i, acc))
                saver.save(sess, FLAGS.log_dir + '/model.ckpt', global_step=i)

The code block shows that checkpoints are written to the log_dir directory (the /mnt/training_logs directory of the pod) after every 100 iterations. All checkpoints are kept because the max_to_keep parameter of Saver is set to 0. After 1,000 iterations, 10 sets of checkpoints are stored in OSS.

Modify the code based on the following requirements to use the OSS SDK to upload checkpoints.

Configure credentials to read the AccessKey pair and bucket information from environment variables. For more information, see Configure access credentials.
To reduce the container memory usage, set max_to_keep to 1. This way, only the latest set of checkpoints is kept. The put_object_from_file function is used to upload checkpoints to the specified directory of the OSS bucket.

Note

When you use the OSS SDK in read/write splitting scenarios, you can use asynchronous reads and writes to accelerate training.

import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider

auth = oss2.ProviderAuth(EnvironmentVariableCredentialsProvider())
url = os.getenv('URL','<default-url>')
bucketname = os.getenv('BUCKET','<default-bucket-name>')
bucket = oss2.Bucket(auth, url, bucket)

...
def train():
  ...
  saver = tf.train.Saver(max_to_keep=1)

 for i in range(FLAGS.max_steps):
    if i % 10 == 0:  # Record summaries and test-set accuracy
      summary, acc = sess.run([merged, accuracy], feed_dict=feed_dict(False))
      print('Accuracy at step %s: %s' % (i, acc))
      if i % 100 == 0:
        print('Save checkpoint at step %s: %s' % (i, acc))
        saver.save(sess, FLAGS.log_dir + '/model.ckpt', global_step=i)
        # FLAGS.log_dir = os.path.join(os.getenv('TEST_TMPDIR', '/mnt'),'training_logs')
        for path,_,file_list in os.walk(FLAGS.log_dir) :  
          for file_name in file_list:  
            bucket.put_object_from_file(os.path.join('tf-train/training_logs', file_name), os.path.join(path, file_name))

The modified container image is registry.cn-beijing.aliyuncs.com/tool-sys/tf-train-demo:ro.

Modify the application template to require the application to access OSS in read-only mode.

Set the accessModes parameter of the PV and PVC to ReadOnlyMany and set the mount target of the OSS bucket to /tf-train/train/data.
Add -o kernel_cache -o max_stat_cache_size=10000 -oumask=022 to otherOpts. This enables ossfs to read data from the memory cache and increases the number of metadata caches. 10,000 metadata caches occupy approximately 40 MB of memory. You can adjust the number based on the instance specifications and the data volume. In addition, the umask grants read permissions to container processes that run as non-root users. For more information, see Use scenarios.
Add the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables to the pod template. You can obtain the values of the environment variables from oss-secret. Make sure that the information is the same as the OSS volume.

View the YAML content of the modified hand-drawn image recognition training application

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: oss-secret
  namespace: default
stringData:
  akId: "<your-accesskey-id>"
  akSecret: "<your-accesskey-secret>"
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: tf-train-pv
  labels:
    alicloud-pvname: tf-train-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadOnlyMany
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: ossplugin.csi.alibabacloud.com
    volumeHandle: tf-train-pv
    nodePublishSecretRef:
      name: oss-secret
      namespace: default
    volumeAttributes:
      bucket: "cnfs-oss-csdr-test"
      url: "oss-cn-beijing.aliyuncs.com"
      otherOpts: "-o max_stat_cache_size=10000 -o kernel_cache -o umask=022"
      path: "/tf-train/train/data"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: tf-train-pvc
spec:
  accessModes:
  - ReadOnlyMany
  resources:
    requests:
      storage: 10Gi
  selector:
    matchLabels:
      alicloud-pvname: tf-train-pv
---
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: tfjob
  name: tf-mnist
  namespace: default
spec:
  containers:
  - command:
    - sh
    - -c
    - python /app/main.py
    env:
    - name: NVIDIA_VISIBLE_DEVICES
      value: void
    - name: gpus
      value: "0"
    - name: workers
      value: "1"
    - name: TEST_TMPDIR
      value: "/mnt"
    - name: OSS_ACCESS_KEY_ID      #Specify the AccessKey ID used to access the PV.
      valueFrom:
        secretKeyRef:
          name: oss-secret
          key: akId
    - name: OSS_ACCESS_KEY_SECRET  #Specify the AccessKey secret used to access the PV.
      valueFrom:
        secretKeyRef:
          name: oss-secret 
          key: akSecret
    - name: URL                    #Ignore if the default URL is configured.
      value: "https://oss-cn-beijing.aliyuncs.com"
    - name: BUCKET                 #Ignore if the default bucket is configured.
      value: "<bucket-name>"
    image: registry.cn-beijing.aliyuncs.com/tool-sys/tf-train-demo:ro
    imagePullPolicy: Always
    name: tensorflow
    ports:
    - containerPort: 20000
      name: tfjob-port
      protocol: TCP
    volumeMounts:
      - name: train
        mountPath: "/mnt/train/data"
    workingDir: /root
  priority: 0
  restartPolicy: Never
  securityContext: {}
  terminationGracePeriodSeconds: 30
  volumes:
  - name: train
    persistentVolumeClaim:
      claimName: tf-train-pvc
EOF

Verify that data can be read and written as normal.
1. Run the following command to query the status of the pod:
```
kubectl get pod tf-mnist
```
  Wait a few minutes until the status of the pod changes from Running to Completed. Expected output:
```
NAME       READY   STATUS    RESTARTS   AGE
tf-mnist   1/1     Completed   0          2m
```
2. Run the following command to print the log of the pod:
  Check the data loading time in the operational log of the pod. The loading time includes the amount of time required for downloading files from OSS and loading the files to TensorFlow.
```
kubectl logs pod tf-mnist | grep dataload
```
  Expected output:
```
dataload cost time:  0.843528985977
```
  The output indicates that caches are used to accelerate read operations in read-only mode. This method is ideal for large-scale training or continuous data loading scenarios.
3. Log on to the OSS console. You can find that files are uploaded to the /tf-train/trainning_logs directory of the OSS bucket. This indicates that data can be written to OSS and read from OSS as normal.

OSS SDK demos provided by Alibaba Cloud

The following table describes the OSS SDK demos provided by Alibaba Cloud. The OSS SDK supports the following programming languages: PHP, Node.js, Browser.js, .NET, Android, iOS, and Ruby. For more information, see SDK references.

Programming language	References

Programming language	References
JAVA	Get started with OSS SDK for Java
Python	Get started with OSS SDK for Python
GO	Getting started with OSS SDK for Go
C++	Get started with OSS SDK for C++
C	Get started with OSS SDK for C

Tool	References
OSS console	Get started by using the OSS console
OpenAPI	PutObject
ossutil	Upload objects
ossbrowser	Use ossbrowser

Prerequisites

Use scenarios

Read-only scenarios

Read/write scenarios

Example

Use ossfs to configure read/write splitting

Use read/write splitting to improve the reading speeds of ossfs

OSS SDK demos provided by Alibaba Cloud

Other tools for configuring OSS read/write splitting

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

China Gateway Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic Desktop Service (EDS) Featured

Cloud Phone Beta

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)