Filter OSS objects whose last modified time is within a specific period of time - Object Storage Service

If you want to filter Object Storage Service (OSS) objects whose last modified time is within a specific period of time from a large number of objects in a bucket, you can use the data indexing feature. When you use the data indexing feature, you can specify the start date and end date of the last modified time of an object as index conditions to improve object query efficiency. This feature is suitable for the accurate retrieval of time-sensitive objects in audit trails, data synchronization, periodic backup, cost analysis, or other business scenarios.

Scenarios

Audit and compliance requirements
Enterprises or organizations may need to periodically review data activities within a specific period of time to ensure their data meets internal data management policies, industry regulations, or supervision requirements. Querying objects uploaded within a specific period of time can help you manage data lifecycle, track data generation and change history, and audit data.
Data backup and restoration
When you specify a backup policy or perform a data recovery operation, you may need to only focus on objects uploaded or updated within a specific period of time. For example, when you perform incremental backup, you need to only back up objects that are newly uploaded or modified since the last backup. By querying objects uploaded within a specific period of time, you can specify the objects that you want to back up, which reduces storage capacity and transmission costs.
Data analysis and report generation
For applications that involve big data analysis, log processing, or business reporting, new data generated within a specific period of time may need to be processed on a regular basis, such as daily, weekly, or monthly. Querying objects generated within a specific period of time can help you quickly identify the dataset that you want to analyze and simplify the data extraction process.
Data synchronization performed by a content management system
If OSS stores website content, media assets, or document libraries, the content management system may need to regularly capture or synchronize objects uploaded or updated within a specific period of time to maintain the timeliness and integrity of website content.
Cost optimization and resource deletion
To control storage costs or comply with data retention policies, enterprises may periodically review and delete objects that are no longer needed. Querying and listing objects that have not been updated within a specific period of time helps you identify objects that are no longer needed to be retained. This ensures the efficient use of resources.
Troubleshooting and issue tracing
When technical engineers troubleshoot system failures, data loss, or unusual behaviors, they may need to backtrack data changes over a specific period of time. Querying objects uploaded within this period of time helps you quickly locate object versions or operations that may cause issues.
Collaboration and versioning
In a multi-user collaboration environment, team members may need to view or restore object versions uploaded by other members within a specific period of time. Querying objects uploaded within a specific period of time helps you browse, compare, or restore historical versions to a specific point in time.

Methods

Use the OSS console

Log on to the OSS console.
In the left-side navigation pane, click Buckets. On the Buckets page, find and click the desired bucket.
In the left-side navigation tree, choose Object Management > Data Indexing.
On the Data Indexing page, turn on Metadata Management.
The amount of time required for metadata management to take effect varies based on the number of objects in the bucket.
In the Basic Filtering Conditions section, specify Start Date and End Date to the right of Last Modified At. The value of Start Date and End Date are accurate to seconds. Retain the default settings for other parameters.
In the Object Sort Order section, set Sort Order to Ascending and Sorted By to Object Name.
Click Query.

Use OSS SDKs

Only OSS SDK for Java, OSS SDK for Python, and OSS SDK for Go allow you to use the data indexing feature to query objects that meet specific conditions. Before you use the data indexing feature to query objects in a bucket, you must enable the metadata management feature for the bucket.

Java

package com.aliyun.sts.sample;

import com.aliyun.oss.ClientException;
import com.aliyun.oss.OSS;
import com.aliyun.oss.common.auth.*;
import com.aliyun.oss.OSSClientBuilder;
import com.aliyun.oss.OSSException;
import com.aliyun.oss.model.*;

public class Demo {

    // In this example, the endpoint of the China (Hangzhou) region is used. Specify your actual endpoint. 
    private static String endpoint = "https://oss-cn-hangzhou.aliyuncs.com";
    // Specify the name of the bucket. Example: examplebucket. 
    private static String bucketName = "examplebucket";

    public static void main(String[] args) throws Exception {
        // Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
        EnvironmentVariableCredentialsProvider credentialsProvider = CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
        // Create an OSSClient instance. 
        OSS ossClient = new OSSClientBuilder().build(endpoint, credentialsProvider);

        try {
            // Set the maximum number of objects that you want to return to 100. 
            int maxResults = 100;
            // Specify the start date and end date of the last modified time of the objects that you want to query. In this example, the start date is set to December 1, 2023 and the end date is set to March 31, 2024. 
            String query = "{\n" +
                    "  \"SubQueries\":[\n" +
                    "    {\n" +
                    "      \"Field\":\"FileModifiedTime\",\n" +
                    "      \"Value\": \"2023-12-01T00:00:00.000+08:00\",\n" +
                    "      \"Operation\":\"gt\"\n" +
                    "    },         \n" +
                    "    {\n" +
                    "      \"Field\":\"FileModifiedTime\",\n" +
                    "      \"Value\": \"2024-03-31T23:59:59.000+08:00\",\n" +
                    "      \"Operation\":\"lt\"\n" +
                    "    }\n" +
                    "  ],\n" +
                    "  \"Operation\":\"and\"\n" +
                    "}";
            // Specify that the returned results are sorted in ascending order by object name. 
            String sort = "Filename";
            DoMetaQueryRequest doMetaQueryRequest = new DoMetaQueryRequest(bucketName, maxResults, query, sort);
            doMetaQueryRequest.setOrder(SortOrder.ASC);
            DoMetaQueryResult doMetaQueryResult = ossClient.doMetaQuery(doMetaQueryRequest);

            if (doMetaQueryResult.getFiles() != null) {
                for (ObjectFile file : doMetaQueryResult.getFiles().getFile()) {
                    System.out.println("Filename: " + file.getFilename());
                    // Query the ETag values that are used to identify the content of the objects. 
                    System.out.println("ETag: " + file.getETag());
                    // Query the access control list (ACL) of the objects.
                    System.out.println("ObjectACL: " + file.getObjectACL());
                    // Query the type of the objects. 
                    System.out.println("OssObjectType: " + file.getOssObjectType());
                    // Query the storage class of the objects. 
                    System.out.println("OssStorageClass: " + file.getOssStorageClass());
                    // Query the number of tags of the objects. 
                    System.out.println("TaggingCount: " + file.getOssTaggingCount());
                    if (file.getOssTagging() != null) {
                        for (Tagging tag : file.getOssTagging().getTagging()) {
                            System.out.println("Key: " + tag.getKey());
                            System.out.println("Value: " + tag.getValue());
                        }
                    }
                    if (file.getOssUserMeta() != null) {
                        for (UserMeta meta : file.getOssUserMeta().getUserMeta()) {
                            System.out.println("Key: " + meta.getKey());
                            System.out.println("Value: " + meta.getValue());
                        }
                    }
                }
            }
        } catch (OSSException oe) {
            System.out.println("Error Message:" + oe.getErrorMessage());
            System.out.println("Error Code:" + oe.getErrorCode());
            System.out.println("Request ID:" + oe.getRequestId());
            System.out.println("Host ID:" + oe.getHostId());
        } catch (ClientException ce) {
            System.out.println("Error Message: " + ce.getMessage());
        } finally {
            // Shut down the OSSClient instance. 
            ossClient.shutdown();
        }
    }
}

Python

# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
from oss2.models import MetaQuery, AggregationsRequest
# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
auth = oss2.ProviderAuth(EnvironmentVariableCredentialsProvider())

# In this example, the endpoint of the China (Hangzhou) region is used. Specify your actual endpoint. 
# Specify the name of the bucket. Example: examplebucket. 
bucket = oss2.Bucket(auth, 'http://oss-cn-hangzhou.aliyuncs.com', 'examplebucket0703')

# Specify the start date and end date of the last modified time of the objects that you want to query. In this example, the start date is set to December 1, 2023 and the end date is set to March 31, 2024. 
do_meta_query_request = MetaQuery(max_results=100, query='{"SubQueries":[{"Field":"FileModifiedTime","Value": "2023-12-01T00:00:00.000+08:00","Operation":"gt"}, {"Field":"FileModifiedTime","Value": "2024-03-31T23:59:59.000+08:00","Operation":"lt"}],"Operation":"and"}', sort='Filename', order='asc')
result = bucket.do_bucket_meta_query(do_meta_query_request)

for s in result.files:
    print(s.file_name)
    print(s.etag)
    print(s.oss_object_type)
    print(s.oss_storage_class)
    print(s.oss_crc64)
    print(s.object_acl)

Go

package main

import (
	"fmt"
	"github.com/aliyun/aliyun-oss-go-sdk/oss"
	"os"
)

func main() {
	// Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
	provider, err := oss.NewEnvironmentVariableCredentialsProvider()
	if err != nil {
		fmt.Println("Error:", err)
		os.Exit(-1)
	}

	// Create an OSSClient instance. 
	// Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com. Specify your actual endpoint. 
	client, err := oss.New("oss-cn-hangzhou.aliyuncs.com", "", "", oss.SetCredentialsProvider(&provider))
	if err != nil {
		fmt.Println("Error:", err)
		os.Exit(-1)
	}
	// Specify the start date and end date of the last modified time of the objects that you want to query. In this example, the start date is set to December 1, 2023 and the end date is set to March 31, 2024. 
	query := oss.MetaQuery{
		NextToken:  "",
		MaxResults: 100,
		Query: `{
  "SubQueries":[
    {
      "Field":"FileModifiedTime",
      "Value": "2023-12-01T00:00:00.000+08:00",
      "Operation":"gt"
    },         
    {
      "Field":"FileModifiedTime",
      "Value": "2024-03-31T23:59:59.000+08:00",
      "Operation":"lt"
    }
  ],
  "Operation":"and"
}`,
                // Specify that the returned results are sorted in ascending order by object name. 
		Sort:  "Filename",
		Order: "asc",
	}
	// Query objects that match specific conditions and list object information based on specific fields and sorting methods. 
	result, err := client.DoMetaQuery("examplebucket", query)
	if err != nil {
		fmt.Println("Error:", err)
		os.Exit(-1)
	}
	fmt.Printf("NextToken:%s\n", result.NextToken)
	for _, file := range result.Files {
		fmt.Printf("File name: %s\n", file.Filename)
		fmt.Printf("size: %d\n", file.Size)
		fmt.Printf("File Modified Time:%s\n", file.FileModifiedTime)
		fmt.Printf("Oss Object Type:%s\n", file.OssObjectType)
		fmt.Printf("Oss Storage Class:%s\n", file.OssStorageClass)
		fmt.Printf("Object ACL:%s\n", file.ObjectACL)
		fmt.Printf("ETag:%s\n", file.ETag)
		fmt.Printf("Oss CRC64:%s\n", file.OssCRC64)
		fmt.Printf("Oss Tagging Count:%d\n", file.OssTaggingCount)
		for _, tagging := range file.OssTagging {
			fmt.Printf("Oss Tagging Key:%s\n", tagging.Key)
			fmt.Printf("Oss Tagging Value:%s\n", tagging.Value)
		}
		for _, userMeta := range file.OssUserMeta {
			fmt.Printf("Oss User Meta Key:%s\n", userMeta.Key)
			fmt.Printf("Oss User Meta Key Value:%s\n", userMeta.Value)
		}
	}
}

Use the OSS API

If your business requires a high level of customization, you can directly call RESTful APIs. To directly call an API, you must include the signature calculation in your code. For more information, see DoMetaQuery.

References

The data indexing feature supports multiple filtering conditions, such as the storage class, ACL, and object size. You can filter objects that meet specific conditions from a large number of objects in the bucket. For example, if you want to query the objects whose ACL is public-read or whose size is less than 64 KB in a bucket, use the data indexing feature to specify the filtering conditions based on your business requirements. For more information, see Data indexing.