This topic describes the cache mode of JindoFileSystem (JindoFS) and its use scenarios.
Overview
In cache mode, JindoFS stores data files as objects in Object Storage Service (OSS) and caches data and metadata of these files in the local cluster based on the requirements for accessing these files. This accelerates read and write operations on data and metadata. In addition, the cache mode provides multiple policies for you to synchronize metadata as required.
Scenarios
The cache mode is compatible with original OSS semantics. In cache mode, JindoFS stores data files as objects in OSS and caches data and metadata in the local cluster. This ensures that JindoFS is compatible with the OSS client, E-MapReduce (EMR) OssFileSystem, and other OSS interactive applications. You can also access data that is stored in OSS before you configure JindoFS, without the need to migrate or convert data. In addition, local caches accelerate read and write operations on data and metadata.
Configure JindoFS
You can configure all parameters related to JindoFS in Bigboot, as shown in the following figure.
- The parameters framed in red in the preceding figure are required.
- JindoFS supports multiple namespaces. A namespace named test is used in this topic.
Parameter | Description | Example |
---|---|---|
jfs.namespaces | The namespace supported by JindoFS. Separate multiple namespaces with commas (,). | test |
jfs.namespaces.test.uri | The storage backend of the test namespace. | oss://oss-bucket/ Note You can set the value to a directory in an OSS bucket. In this case, this directory
serves as the root directory, in which the test namespace reads and writes data. Generally,
you can set the value to an OSS bucket to ensure that the path is the same as that
in OSS.
|
jfs.namespaces.test.mode | The storage mode of the test namespace. Set this parameter to cache. | cache |
jfs.namespaces.test.oss.access.key | The AccessKey ID used to access the OSS bucket that serves as the storage backend. | xxxx Note We recommend that you store data in an OSS bucket that is in the same region and under
the same account as your EMR cluster. This ensures high performance and stability.
In this case, you do not need to configure the AccessKey ID and AccessKey secret because
the OSS bucket allows password-free access from the EMR cluster.
|
jfs.namespaces.test.oss.access.secret | The AccessKey secret used to access the OSS bucket that serves as the storage backend. |
Save and deploy the JindoFS configuration. Restart Namespace Service in SmartData to use JindoFS.
Configure a metadata synchronization policy
In cache mode, some data may already exist in OSS before you configure JindoFS. In this scenario, after JindoFS is configured, the data and metadata are synchronized to JindoFS for future access. The data synchronization policy is that data is cached in the local cluster each time data is accessed. JindoFS supports two types of metadata synchronization policies: interval policy and loading policy.
-
Interval policy:
You can set the namespace.sync.interval parameter to specify the synchronization interval. The default value is -1, which indicates that JindoFS does not synchronize metadata from OSS.
- If you set this parameter to 0, JindoFS synchronizes metadata from OSS each time data is accessed.
- If you set this parameter to a value greater than 0, JindoFS synchronizes metadata from OSS at intervals of the set value in units of
seconds.
Note For example, if you set this parameter to 5, JindoFS synchronizes metadata from OSS every 5 seconds.
-
Loading policy:
You can set the namespace.sync.loadtype parameter to specify the loading policy. Valid values are never, once, and always. never indicates that JindoFS never synchronizes metadata from OSS. once indicates that JindoFS synchronizes metadata from OSS only once. This is the default value. always indicates that JindoFS synchronizes metadata from OSS each time data is accessed.
Note The namespace.sync.loadtype parameter takes effect only when you do not specify the namespace.sync.interval parameter.