The block storage mode ensures efficient read and write operations and high metadata
accessibility. JindoFS stores data as blocks in OSS and caches data in local disks
of clusters to accelerate data access. JindoFS uses Namespace Service to manage metadata
and ensure high metadata accessibility. This topic describes how to use JindoFS in
block storage mode.
Background information
The block storage mode of JindoFS has the following features:
- JindoFS offers tremendous and scalable storage capacity by using OSS as the storage
backend. The storage capacity is independent of the EMR cluster scale. The local cluster
can be scaled in or out as required.
- JindoFS stores some backup data in the local cluster to accelerate read operations.
This improves the throughput by using limited local storage capacity, especially for
Write Once Read Many (WORM) solutions.
- JindoFS provides efficient metadata query similar to HDFS. Compared with OssFileSystem,
JindoFS saves much time in metadata query. In addition, JindoFS avoids system instability
when data and metadata are frequently accessed.
- JindoFS ensures maximal data locality when jobs are executed in the EMR cluster. This
reduces the load on network transmission and improves the read performance.
Configure the block storage mode
- Go to the SmartData service.
- Log on to the Alibaba Cloud EMR console.
- In the top navigation bar, select the region where your cluster resides. Select the resource group as required. By default, all resources of the account appear.
- Click the Cluster Management tab.
- On the Cluster Management page that appears, find the target cluster and click Details in the Actions column.
- In the left-side navigation pane, click Cluster Service and then SmartData.
- Go to the namespace tab for the SmartData service.
- Click the Configure tab.
- Click the namespace tab in the Service Configuration section.
- Configure required parameters.
JindoFS allows you to configure multiple namespaces. A namespace named test is used
in this topic.
- Set jfs.namespaces to test.
If you configure multiple namespaces, separate them with commas (,).
- In the upper-right corner of the Service Configuration section, click Custom Configuration. In the Add Configuration Item dialog box, add the parameters described in the following table.
Parameter |
Description |
Example |
jfs.namespaces.test.oss.uri |
The storage backend of the test namespace. |
oss://<oss_bucket>/<oss_dir>/
Note We recommend that you set this parameter to a directory of an OSS bucket. The namespace
stores data blocks in this directory.
|
jfs.namespaces.test.mode |
The storage mode of the test namespace. Set this parameter to block. |
block |
jfs.namespaces.test.oss.access.key |
The AccessKey ID of the OSS bucket that serves as the storage backend. |
xxxx
Note We recommend that you store data in an OSS bucket that is in the same region and under
the same account as your EMR cluster. This ensures high performance and stability.
In this case, you do not need to configure the AccessKey ID and AccessKey secret because
the OSS bucket allows password-free access from the EMR cluster.
|
jfs.namespaces.test.oss.access.secret |
The AccessKey secret of the OSS bucket that serves as the storage backend. |
- Click OK.
- In the upper-right corner of the Service Configuration section, click Save.
- Select Restart Jindo Namespace Service from the Actions drop-down list in the upper-right corner.
After Namespace Service is restarted, you can use jfs://test/<path_of_file>
to access files in JindoFS.
Control disk space usage
JindoFS uses OSS as the data storage backend, which allows you to store large volumes
of data. However, the capacity of local disks is limited. JindoFS automatically deletes
cold data in local disks. The storage.watermark.high.ratio
and storage.watermark.low.ratio
parameters are used to adjust the space usage of local disks. You can set the parameters
to decimal numbers between 0 and 1.
- Modify disk usage configurations.
In the
Service Configuration section for the SmartData service, click the
storage tab and configure the parameters described in the following table.
Parameter |
Description |
storage.watermark.high.ratio |
The upper limit of disk usage. When the disk usage of JindoFS data exceeds this limit,
JindoFS automatically deletes data in the disk. Default value: 0.4.
|
storage.watermark.low.ratio |
The lower limit of disk usage. After automatic data deletion is triggered, JindoFS
starts to delete data until the disk usage of JindoFS data is reduced to this limit.
Default value: 0.2.
|
Note You can configure the upper limit and lower limit to adjust the disk space assigned
to JindoFS. Make sure that the upper limit is greater than the lower limit.
- Save the configurations.
- In the upper-right corner of the Service Configuration section, click Save.
- In the Confirm Changes dialog box, specify Description and turn on Auto-update Configuration.
- Click OK.
- Restart Jindo Storage Service to apply the configurations.
- Choose in the upper-right corner.
- In the Cluster Activities dialog box, specify the related parameters.
- Click OK.
- In the Confirm message, click OK.