All Products
Search
Document Center

Object Storage Service:How to use automatic storage tiering of OSS-HDFS

Last Updated:Nov 14, 2024

Some data in buckets for which OSS-HDFS is enabled is not frequently accessed but needs to be retained to meet compliance or archiving requirements. To meet these requirements, OSS-HDFS provides the automatic storage tiering feature. This feature automatically converts the storage class of frequently accessed data to Standard and the storage class of rarely accessed data to Infrequent Access (IA), Archive, or Cold Archive to reduce storage costs.

Prerequisites

  • Data is written to OSS-HDFS.

  • The bucket for which you want to enable the automatic storage tiering feature is located in one of the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Zhangjiakou), China (Hong Kong), Singapore, Germany (Frankfurt), US (Silicon Valley), US (Virginia), and Indonesia (Jakarta).

  • A ticket is submitted to use the automatic storage tiering feature.

  • JindoSDK 4.4.0 or later is installed and configured. For more information, see Connect non-EMR clusters to OSS-HDFS.

Usage notes

  • You are charged data retrieval fees when you read IA, Archive, or Cold Archive objects in OSS-HDFS. We recommend that you do not store frequently accessed data as IA, Archive, or Cold Archive objects. For more information about the data retrieval fees, see Data processing fees.

  • When you configure a storage policy for data in OSS-HDFS, you must add tags to data blocks. You are charged for the tags based on object tagging billing rules. For more information, see Object tagging fees.

  • If the version of JindoSDK is earlier than 6.4.0, you cannot create an object in an IA, Archive, or Cold Archive directory. If you need to create an object in an IA, Archive, or Cold Archive directory. If you need to create an object in an IA, Archive, or Cold Archive directory, you can create an object and save it in a Standard directory. Then, move the object to the IA, Archive, or Cold Archive directory by using the rename operation.

    If you want to directly create an object in the IA, Archive, or Cold Archive directory, you must upgrade JindoSDK to 6.4.0 or later.

  • When you convert the storage class of objects to Archive or Cold Archive, additional system overheads are generated and data restoration is slow. Proceed with caution.

  • You can convert the storage class of Archive objects to Cold Archive, but you cannot convert the storage class of Cold Archive objects to Archive.

Procedure

  1. Configure environment variables.

    1. Connect to an Elastic Compute Service (ECS) instance. For more information, see Connect to an instance.

    2. Go to the bin directory of the installed JindoSDK JAR package.

      cd jindosdk-x.x.x/bin/
      Note

      x.x.x indicates the version number of the JindoSDK JAR package.

    3. Grant read and write permissions to the jindo-util file in the bin directory.

      chmod 700 jindo-util
    4. Rename the jindo-util file to jindo.

      mv jindo-util jindo
    5. Create a configuration file named jindosdk.cfg, and then add the following parameters to the configuration file:

      [common] Retain the following default configurations: 
      logger.dir = /tmp/jindo-util/
      logger.sync = false
      logger.consolelogger = false
      logger.level = 0
      logger.verbose = 0
      logger.cleaner.enable = true
      hadoopConf.enable = false
      
      [jindosdk] Specify the following parameters: 
      <!-- In this example, the endpoint of the China (Hangzhou) region is used. Specify your actual endpoint.  -->
      fs.oss.endpoint = cn-hangzhou.oss-dls.aliyuncs.com
      <! -- Configure the AccessKey ID and AccessKey secret that are used to access OSS-HDFS.  -->
      fs.oss.accessKeyId = LTAI********    
      fs.oss.accessKeySecret = KZo1********                                        
    6. Configure environment variables.

      export JINDOSDK_CONF_DIR=<JINDOSDK_CONF_DIR>

      Set <JINDOSDK_CONF_DIR> to the absolute path of the jindosdk.cfg configuration file.

  2. Specify a storage policy for the data that is written to OSS-HDFS. The following table describes the storage policy.

    Scenario

    Command

    Result

    IA

    ./jindo fs -setStoragePolicy -path oss://examplebucket/dir1 -policy CLOUD_IA

    Objects in the dir1/ directory contain a tag whose key is transition-storage-class and whose value is IA.

    Archive

    ./jindo fs -setStoragePolicy -path oss://examplebucket/dir2 -policy CLOUD_AR

    Objects in the dir2/ directory contain a tag whose key is transition-storage-class and whose value is Archive.

    Cold Archive

    ./jindo fs -setStoragePolicy -path oss://examplebucket/dir3 -policy CLOUD_COLD_AR

    Objects in the dir3/ directory contain a tag whose key is transition-storage-class and whose value is ColdArchive.

  3. Enable the automatic storage tiering feature.

    1. Log on to the OSS console.

    2. In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the bucket for which you want to enable the automatic storage tiering feature.

    3. In the left-side navigation tree, choose Data Lake > OSS-HDFS.

    4. On the OSS-HDFS tab, click Configure.

    5. In the Basic Settings section of the Automatic Storage Tiering panel, turn on Status.

      1

      To prevent the automatic storage tiering feature from failing to run as expected due to incorrect configurations, OSS automatically creates a lifecycle rule to convert the storage class of data in OSS-HDFS that contains a specific tag:

      • The lifecycle rule specifies that the storage class of the data that contains a tag whose key is transition-storage-class and whose value is IA in the .dlsdata/ directory is converted to IA one day after the data is last modified.

      • The lifecycle rule specifies that the storage class of the data that contains a tag whose key is transition-storage-class and whose value is Archive in the .dlsdata/ directory is converted to Archive one day after the data is last modified.

      • The lifecycle rule specifies that the storage class of the data that contains a tag whose key is transition-storage-class and whose value is ColdArchive in the .dlsdata/ directory is changed to Cold Archive one day after the data is last modified.

      Important

      Do not modify the lifecycle rule that is automatically created after the automatic storage tiering feature is enabled. Otherwise, data or OSS-HDFS service exceptions may occur.

    6. Click OK.

      • OSS-HDFS converts the storage class of objects based on the storage policy configured in Step 2.

      • OSS loads a lifecycle rule within 24 hours after the rule is created. After the rule is loaded, OSS starts to execute the rule at 08:00 (UTC+8) every day. The specific execution time varies based on the number of objects. The objects are converted to the specified storage class within at least 48 hours.

Related commands

Syntax

Description

./jindo fs -setStoragePolicy -path <path> -policy <policy>

Specifies a storage policy for data in a path.

  • -path: specifies the path of the object or directory.

  • -policy: specifies a storage policy. Valid values:

    • CLOUD_STD: the Standard storage class.

    • CLOUD_IA: the IA storage class.

    • CLOUD_AR: the Archive storage class.

    • CLOUD_COLD_AR: the Cold Archive storage class.

    • CLOUD_AR_RESTORED: the storage class of the Archive object that is temporarily restored. Requirements exist for the retention period of the restored object.

    • CLOUD_COLD_AR_RESTORED: the storage class of the Cold Archive object that is temporarily restored. Requirements exist for the retention period of the restored object.

Important
  • The size of Archive or Cold Archive data that you want to restore at a time cannot exceed 5 TB, and the size of data in the Processing state cannot exceed 50 TB.

  • If you do not specify the storage class of an object or a subdirectory, the object or subdirectory inherits the storage class of the directory to which they belong. For example, if the storage class of the oss://examplebucket/dir directory is CLOUD_STD and you do not specify the storage class of the oss://examplebucket/dir/subdir subdirectory, the storage class of the oss://examplebucket/dir/subdir subdirectory is also CLOUD_STD.

./jindo fs -getStoragePolicy -path <path>

Obtains the storage policy of data in a specific path.

./jindo fs -unsetStoragePolicy -path <path>

Deletes the storage policy of data in a specific path.

./jindo fs -checkStoragePolicy -path <path>

Obtains the status of the conversion task for data in a specific path based on the storage policy. Valid values:

  • Pending: The conversion task is to be submitted.

  • Submitted: The conversion task is submitted.

  • Processing: The conversion task is being executed.

  • Finalized: The conversion task is complete.

Note

This command is only used to query the status of metadata conversion tasks of OSS-HDFS, and cannot be used to query the processing status of tasks submitted to OSS.

./jindofs fs -setStoragePolicy -path <path> -policy <policy> -restoreDays <restoreDays>

Temporarily restores Archive or Cold Archive data in a specific path.

  • -path: specifies the path of the object or directory.

  • -policy: specifies a storage policy. Valid values:

    • CLOUD_AR_RESTORED: The storage class of the object that you want to restore is Archive.

    • CLOUD_COLD_AR_RESTORED: The storage class of the object that you want to restore is Cold Archive.

  • -restoreDays: specifies the retention period of the restored object. Default value: 1.

    • If the storage class of the object that you want to restore is Archive, the valid values for the restoreDays parameter are 1 to 7.

    • If the storage class of the object that you want to restore is Cold Archive, the valid values for the restoreDays parameter are 1 to 365.

When you temporarily restore Archive or Cold Archive objects, take note of the following items:

Important
  • After you use the CLOUD_AR or CLOUD_COLD_AR storage policy to store data, you must specify an interval of more than two days before you can restore the data.

  • After the data is restored, the data cannot be read immediately. In most cases, several minutes is required before you read an Archive object and several hours are required before you can read a Cold Archive object.

  • After the retention period of the restored object ends, the object cannot be read. During the retention period of the restored object, you can restore the object again, but the interval between two restoration operations must be more than 2 days.

FAQ

References