Overview of ossfs - Object Storage Service - Alibaba Cloud Documentation Center

ossfs allows you to mount an Object Storage Service (OSS) bucket to a local directory on the Linux operating system. This way, you can manage data in the bucket in the same manner that you manage local files.

Introduction

ossfs is a Filesystem in Userspace (FUSE) based file system that allows you to mount an OSS bucket to a local directory on the Linux operating system and supports the following features:

Supports most features described in POSIX standards, such as file and directory uploads and downloads, and user permission management.
Uses multipart upload and resumable upload to upload OSS objects by default.
Supports MD5 verification to ensure data integrity.

Runtime environment

ossfs is a FUSE-based file system and works only on FUSE-compatible machines. ossfs provides installer packages for the following systems. To run ossfs in other environments, you need to use the source code to build the required program.

Linux
- CentOS 7.0 or later
- Ubuntu 14.04 or later
- Anolis7 or later
FUSE 2.8.4 or later
You can run the fusermount -V command to check the version of FUSE. If the value of the fusermount version parameter in the response is 2.8.4 or later, such as 2.9.2, the version of FUSE meets the requirements.

Limitations

The following limits apply to ossfs when you use ossfs to mount a bucket to a local directory on the Linux operating system:

ossfs is not suitable for scenarios that require highly concurrent read and write operations.
Note
- Both read and write operations consume the disk capacity. In highly concurrent read/write scenarios, disk performance limits read and write operations.
- Concurrent read and write requests compete for resources, which affects the bandwidth.
ossfs does not support hard links.
Archive, Cold Archive, and Deep Cold Archive buckets cannot be mounted to local file systems by using ossfs.
If you use ossfs to edit an uploaded object, the object is re-uploaded.
The performance of metadata-related operations, such as list directory, is compromised because you must remotely access the OSS server.
Errors may occur if you rename an object or a directory. Operation failures may cause data inconsistencies.
If a bucket is mounted to multiple clients and data is simultaneously written to the mount points, ossfs does not guarantee consistency.
Make sure that your AccessKey pair have full permissions for the target bucket or resources whose names are prefixed with specified values. Insuffcient permissions may result in the failure of mount points and other potential issues.

How to use ossfs

The following section applies to ossfs 1.91.3 and later. For more information about how to download and install the latest version of ossfs, see Installation.

Mode introduction

Default read mode

The default read mode is suitable for random read-only operations on small files (files that can be entirely cached in the page cache) and large files. For example, when your AI training task has poor performance for reading images in direct read mode, even if you supplement specific random read operations, we recommend that you switch to the default read mode.

When ossfs reads a file, the kernel caches a copy of the file from the mount point to the memory and writes the data to a file in the local disk. As a result, the cache size consumed by the read operation is twice the size of the file.

If the page cache on your operating system can cache up to 6 GB of data in dirty pages, the default read mode is theoretically suitable for reading a file that is less than 3 GB in size.
You can use the parallel_count parameter to change the number of concurrent download tasks and the multipart_size parameter to specify the amount of data that can be downloaded by a single task.

Direct read mode

The direct read mode is suitable for scenarios that involve sequential reads of large files and allows a limited degree of random read access (such as reads that skip a few chunks). For example, you can use the hybrid read mode in AI inference scenarios to load a large Safetensors file.

To enable the direct read mode, set the -odirect_read parameter to enabled.
In the direct read mode, ossfs retains in the memory the data within the range of [-direct_read_backward_chunks * direct_read_chunk_size, +direct_read_prefetch_chunks * direct_read_chunk_size], in which direct_read_chunk_size is set to 4 MB by default, direct_read_prefetch_chunks is set to 32 by default, and direct_read_backward_chunks is set to 1 by default. By default, ossfs retains in the memory the data within the range of [-4 MB, +128 MB]. The direct read mode also supports random reads that are limited to a small range. For example, if two consecutive reads from a Safetensors file are within the range of [-32 MB, +32 MB], you can configure -odirect_read_backward_chunks=8 to retain 32 MB of data prior to the current offset.
You can modify the direct_read_prefetch_chunks and direct_read_chunk_size parameters to increase the amount of data that can be prefetched in parallel for maximized bandwidth usage.

Hybrid read mode

The hybrid read mode is suitable for read-only operations on a combination of small files (files that can be entirely cached in the page cache) and large files and also allows a limited degree of random read access, such as reads that skip a few chunks. For example, you can use the hybrid read mode in AI inference scenarios to load a large Safetensors file. If random reads span a wide offset range, the hybrid read mode provides lower read performance compared with the default read mode.

To enable the hybrid read mode, set the -odirect_read parameter to enabled.
You must configure the direct_read_local_file_cache_size_mb parameter to specify the data size threshold beyond which the direct read mode is used. For example, if your machine provides up to 6 GB of page cache, you can configure -odirect_read_local_file_cache_size_mb=3072 to switch to the direct read mode when the downloaded data reaches 3 GB.
In the direct read mode, ossfs retains in the memory the data within the range of [-direct_read_backward_chunks * direct_read_chunk_size, +direct_read_prefetch_chunks * direct_read_chunk_size], in which direct_read_chunk_size is set to 4 MB by default, direct_read_prefetch_chunks is set to 32 by default, and direct_read_backward_chunks is set to 1 by default. By default, ossfs retains in the memory the data within the range of [-4 MB, +128 MB]. The direct read mode also supports random reads that are limited to a small range. For example, if two consecutive reads from a Safetensors file are within the range of [-32 MB, +32 MB], you can configure -odirect_read_backward_chunks=8 to retain 32 MB of data prior to the current offset.
You can modify the direct_read_prefetch_chunks and direct_read_chunk_size parameters to increase the amount of data that can be prefetched in parallel for maximized bandwidth usage.

Mode selection guidance

If ossfs reads a file and writes the file at the same time, use the default read mode.

If ossfs only reads a file, or reads a file and writes a different file:

Scenario	Description
Only small files	Use the default read mode.
Only large files	To sequentially read a large file or randomly read specific Safetensors files that have read offsets that cover a narrow range, use the direct read mode. To perform random read operations that cover a wide offset range, use the default read mode. If you are uncertain of the appropriate read mode for your business scenarios or the performance remains unsatisfactory after you modify the direct_read_backward_chunks parameter in direct read mode, use the default read mode.
Small files and large files	To sequentially read a large file or randomly read specific Safetensors files that have read offsets that cover a narrow range, use the hybrid read mode. If you want to perform random reads that have offsets that cover a wide range, are uncertain of the appropriate read mode, or still experience unsatisfactory performance after you modify the direct_read_backward_chunks parameter in hybrid read mode, use the default read mode.

Note

When the direct read mode or the hybrid read mode provides unsatisfactory performance, switch to the default read mode to store data to local disks. In default read mode, disk performance is a performance constraint of ossfs read performance. We recommend that you use a disk of a higher performance level. For example, you can use ESSD AutoPL disks that have appropriate provisioned performance and burst performance settings.

What to do next

Before you use ossfs to mount an OSS bucket to a local directory, you must install and configure ossfs and perform mount operations. For more information, see Installation and Configure ossfs and perform mount operations.

References

For more information about how to perform mount operations, see Configure ossfs and perform mount operations.
For more information about the ossfs options, see Options supported by ossfs.
For more information about the new features of different versions of ossfs, see New features of different versions of ossfs.
For more information about the issues you may encounter when you use ossfs, see FAQ.