This topic introduces the updates in ossfs 1.91.3.
New feature
New parameter direct_read_backward_chunks.
In ossfs V1.91.2, the direct read mode retains in the memory the data within the range from the previous chunk before the current chunk to the next direct_read_prefetch_chunks chunks after the current chunk. Only one chunk before the current read position is retained. If you attempt to read data more than one chunk before the current read position, a significant amount of prefetched data is discarded, which can cause additional bandwidth consumption, wasted resources, and performance degradation.
In ossfs V1.91.3, the direct_read_backward_chunks parameter is added to allow ossfs to retain in the memory the specified number of chunks before the current chunk. You can use the direct_read_backward_chunks parameter together with the direct_read_prefetch_chunks parameter to retain in the memory the data within the range from the specified number of chunks before the current chunk to the specified number of chunks after the current chunk. In AI reasoning scenarios, such as when you load Safetensors files (random reads), you can reasonably increase the value of the direct_read_backward_chunks parameter to retain more data in the memory and reduce repeated data downloads for better performance.
Parameter | Description | Default value | |
direct_read_backward_chunks | The number of chunks before the current read position that can be retained in direct read mode. The default size of a chunk is 4 MB.
| 1 | |
stat_cache_expire | The metadata validity period in seconds. Starting from this version, the parameter can be set to -1, which specifies that the metadata never expires. When metadata expires, data is reloaded to the buffer. | 900 |
Hybrid read mode
In scenarios where random reads are frequent and read offsets span a wide range:
In direct read mode, ossfs frequently downloads data, discards data, and re-downloads data. This significantly reduces read performance.
In default read mode, ossfs downloads data to the local disk and does not discard downloaded data until the disk space is used up. As a result, no repeated downloads occur.
If the size of the requested file is not big, it is written to the page cache in the memory and served to the requester directly from the page cache. This way, read performance is not restricted by disk performance.
If the size of the requested file is too big to be completely written to the page cache, read performance is restricted by disk performance.
The hybrid read mode combines the benefits of the default mode (reading data from the disk) and direct read mode. In hybrid read mode, small files take full advantage of page cache acceleration, whereas large files initially utilize page cache efficiently and are switched to the direct read mode when a defined threshold is reached. The hybrid read mode can avoid potential disk performance constraints when reading both small and large files.
Parameter | Description | Default value |
direct_read_local_file_cache_size_mb | In hybrid read mode, data is initially downloaded to the local disk by default. When the downloaded data exceeds the size threshold (in MB) specified by this parameter, the direct read mode is used. | 0 (Setting this value is equivalent to only using the direct read mode.) |
Performance testing
Mount the remote Bucket to the local filesystem by using ossfs, and then use PyTorch to load the file for performance testing. The test results are as follows:
Machine specifications
Memory: 15 GB
Disk bandwidth: 150 MB/s
Internal bandwidth: 500 MB/s
Mount commands
Mount ossfs in default read mode:
ossfs [bucket name] [mountpoint] -ourl=[endpoint] -oparallel_count=32 -omultipart_size=16
Mount ossfs in direct read mode:
ossfs [bucket name] [mountpoint] -ourl=[endpoint] -odirect_read -odirect_read_chunk_size=8 -odirect_read_prefetch_chunks=64 -odirect_read_backward_chunks=16
Mount ossfs in hybrid read mode
ossfs [bucket name] [mountpoint] -ourl=[endpoint] -oparallel_count=32 -omultipart_size=16 -odirect_read -odirect_read_chunk_size=8 -odirect_read_prefetch_chunks=64 -odirect_read_backward_chunks=16 -odirect_read_local_file_cache_size_mb=3072
Testing
The following sample code provides a sample script for performance testing:
import time from safetensors.torch import load_file file_path = "./my_folder/bert.safetensors" start = time.perf_counter() loaded = load_file(file_path) end = time.perf_counter() elapsed = end - start print("time_spent: ", elapsed)
The following table provides test results.
NoteThe test results are for reference only. The actual read performance varies with the file size and the structure of the Safetensors model.
File size
Default read mode
Direct read mode
Hybrid read mode
2.0 GB
4.00s
5.86s
3.94s
5.3 GB
20.54s
27.33s
19.91s
6.5 GB
30.14s
24.23s
17.93s