All Products
Search
Document Center

Object Storage Service:Use MetaSearch to search for OSS objects based on metadata attributes

Last Updated:Nov 04, 2024

Provided by Object Storage Service (OSS), Metasearch is an indexing feature based on the metadata of objects, which can be specified as index conditions to query objects. This way, you can manage and learn about data structures, perform queries, collect statistics, and manage objects in an efficient manner.

Scenarios

Data audit

You can use MetaSearch to quickly find objects to meet data audit or regulatory requirements. For example, in the financial industry, you can filter objects by using metadata such as custom tags and access control lists (ACLs). This way, you can search for objects that have a specific sensitivity level or specific ACLs to improve the efficiency of data audits.

Enterprise data backup and archiving

When an enterprise wants to back up and archive data, the enterprise can use MetaSearch to quickly search for objects of a specific creation date or storage class based on object metadata such as the creation date, storage class, and custom tags. This way, the enterprise can quickly restore historical data or archived objects.

Usage notes

  • Supported regions

    MetaSearch is supported for buckets that are located in the China (Hangzhou), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Guangzhou), China (Hong Kong), and Singapore regions.

  • Object quantity

    By default, MetaSearch is supported only for a bucket that contains up to 10 billion objects.

  • Billing rules

    MetaSearch is in public preview, during which you can use MetaSearch free of charge. To use the MetaSearch feature, you must enable the metadata management feature. After the public preview, you will be charged for metadata management and metadata retrievals. For more information about the billable items of the data indexing feature, see Data indexing fees.

  • Time required for indexing

    After you enable MetaSearch, OSS creates an index. The time required to create the index is proportional to the number of objects stored in the bucket. If a larger number of objects are stored in the bucket, a longer period of time is required to create the index. In most cases, the first time you create an index for 10 million objects, approximately 1 hour is required. The first time you create an index for 1 billion objects, approximately 1 day is required. The first time you create an index for 10 billion objects, approximately 2 to 3 days are required. The preceding time is provided only for reference.

  • Multipart upload

    If a bucket contains objects that are uploaded by using multipart upload, the search results include only the complete objects combined by calling the CompleteMultipartUpload operation. Parts that are uploaded by multipart upload tasks that are initiated but are not completed or canceled are not included in the search results.

Methods

Use the OSS console

In this example, the following search conditions are used to search for objects: 1. Object size: less than 500 KB; 2. Last modified time: from 00:00:00 on September 11, 2024 to 00:00:00 on September 12, 2024; 3. Sort order: sort objects by object size in the ascending order; 4. Data aggregation: display the maximum size of the objects that meet the preceding requirements.

Buckets in the China (Guangzhou) region

  1. Log on to the OSS console

  2. In the left-side navigation pane, click Buckets. On the Buckets page, find and click the desired bucket.

  3. In the left-side navigation tree, choose Object Management > Data Indexing

  4. On the Data Indexing page, click Enable Now.

  5. In the Data Indexing dialog box, select MetaSearch and click Enable.

    Note

    The amount of time required for MetaSearch to take effect varies based on the number of objects in the bucket.

  6. Specify the parameters based on your business requirements in the OSS Metadata Condition section and retain the default settings for other parameters.

    • Set Start Date to 00:00:00 on September 11, 2024 and End Date to 00:00:00 on September 12, 2024 for the Last Modified At parameter.

    • Select Less Than from the drop-down list and enter 500 in the second field for the Object Size parameter.

  7. Specify the search result display method in the Search Result Settings section.

    • Set Object Sort Order to Ascending and select Object Size from the Sorted By drop-down list.

    • Select Object Size from the Output drop-down list and Maximum from the By drop-down list for the Data Aggregation parameter.

  8. Click Query Now.

For more information, see Search conditions and search result settings.

Buckets in the China (Hangzhou), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), and Singapore regions

  1. Log on to the OSS console

  2. In the left-side navigation pane, click Buckets. On the Buckets page, find and click the desired bucket.

  3. In the left-side navigation tree, choose Object Management > Data Indexing.

  4. Enable Metadata Management.

    Note

    The amount of time required for MetaSearch to take effect varies based on the number of objects in the bucket.

  5. Configure the Basic Filtering Conditions according to the following instructions. Retain the default settings for other parameters.

    • Set the Start Time to 00:00:00 on October 20, 2024 and the End Time to 00:00:00 on October 21, 2024 for the Last Modified At parameter.

    • Select Less Than from the drop-down list and enter 1600 in the second field for the Object Size parameter.截屏2024-11-03 12.55.02.png

  6. Show more filtering conditions

    • Set Object Sort Order to Ascending and select Object Size from the Sorted By drop-down list.

    • Select Object Size from the Output drop-down list and Maximum from the By drop-down list for the Data Aggregation parameter.截屏2024-11-03 12.38.59.png

  7. Two objects that meet the search conditions are returned as shown in the following figure. The maximum size of the objects is 1.54 MB.截屏2024-11-03 12.51.59.png

    For more information about the search conditions and search result settings, see Search conditions and search result settings.

Use OSS SDKs

Currently, only OSS SDK for Java, OSS SDK for Python, and OSS SDK for Go allow you to use MetaSearch to query objects that meet specific conditions. Before you use MetaSearch to search for objects in a bucket, you must enable the metadata management feature for the bucket. For more information about how to use MetaSearch to search for objects by using OSS SDKs for other programming languages, see Overview.

import com.aliyun.oss.ClientException;
import com.aliyun.oss.OSS;
import com.aliyun.oss.common.auth.*;
import com.aliyun.oss.OSSClientBuilder;
import com.aliyun.oss.OSSException;
import com.aliyun.oss.model.*;
import java.util.ArrayList;
import java.util.List;

public class Demo {

    // In this example, the endpoint of the China (Hangzhou) region is used. Specify your actual endpoint. 
    private static String endpoint = "https://oss-cn-hangzhou.aliyuncs.com";
    // Specify the name of the bucket. Example: examplebucket. 
    private static String bucketName = "examplebucket";

    public static void main(String[] args) throws Exception {
        // Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
        EnvironmentVariableCredentialsProvider credentialsProvider = CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
        // Create an OSSClient instance. 
        OSS ossClient = new OSSClientBuilder().build(endpoint, credentialsProvider);

        try {
            // Query objects that meet specific conditions and list information about the objects based on specific fields and sorting methods. 
            int maxResults = 20;
            // Query objects that are smaller than 1,048,576 bytes in size, return up to 20 objects at a time, and sort the objects in ascending order. 
            String query = "{\"Field\": \"Size\",\"Value\": \"1048576\",\"Operation\": \"lt\"}";
            String sort = "Size";
            DoMetaQueryRequest doMetaQueryRequest = new DoMetaQueryRequest(bucketName, maxResults, query, sort);
            Aggregation aggregationRequest = new Aggregation();
            Aggregations aggregations = new Aggregations();
            List<Aggregation> aggregationList = new ArrayList<Aggregation>();
            // Specify the name of the field that is used in the aggregate operation. 
            aggregationRequest.setField("Size");
            // Specify the operator that is used in the aggregate operation. max indicates the maximum value. 
            aggregationRequest.setOperation("max");
            aggregationList.add(aggregationRequest);
            aggregations.setAggregation(aggregationList);

            // Specify the aggregate operation. 
            doMetaQueryRequest.setAggregations(aggregations);
            doMetaQueryRequest.setOrder(SortOrder.ASC);
            DoMetaQueryResult doMetaQueryResult = ossClient.doMetaQuery(doMetaQueryRequest);
            if(doMetaQueryResult.getFiles() != null){
                for(ObjectFile file : doMetaQueryResult.getFiles().getFile()){
                    System.out.println("Filename: " + file.getFilename());
                    // Query the ETag values that are used to identify the content of the objects. 
                    System.out.println("ETag: " + file.getETag());
                    // Query the access control list (ACL) of the objects.
                    System.out.println("ObjectACL: " + file.getObjectACL());
                    // Query the type of the objects. 
                    System.out.println("OssObjectType: " + file.getOssObjectType());
                    // Query the storage class of the objects. 
                    System.out.println("OssStorageClass: " + file.getOssStorageClass());
                    // Query the number of tags of the objects. 
                    System.out.println("TaggingCount: " + file.getOssTaggingCount());
                    if(file.getOssTagging() != null){
                        for(Tagging tag : file.getOssTagging().getTagging()){
                            System.out.println("Key: " + tag.getKey());
                            System.out.println("Value: " + tag.getValue());
                        }
                    }
                    if(file.getOssUserMeta() != null){
                        for(UserMeta meta : file.getOssUserMeta().getUserMeta()){
                            System.out.println("Key: " + meta.getKey());
                            System.out.println("Value: " + meta.getValue());
                        }
                    }
                }
            } else if(doMetaQueryResult.getAggregations() != null){
                for(Aggregation aggre : doMetaQueryResult.getAggregations().getAggregation()){
                    // Query the name of the aggregation field. 
                    System.out.println("Field: " + aggre.getField());
                    // Query the aggregation operator. 
                    System.out.println("Operation: " + aggre.getOperation());
                    // Query the values of the aggregate operations. 
                    System.out.println("Value: " + aggre.getValue());
                    if(aggre.getGroups() != null && aggre.getGroups().getGroup().size() > 0){
                        // Query the values of the aggregation operations by group. 
                        System.out.println("Groups value: " + aggre.getGroups().getGroup().get(0).getValue());
                        // Query the total number of the aggregation operations by group. 
                        System.out.println("Groups count: " + aggre.getGroups().getGroup().get(0).getCount());
                    }
                }
            } else {
                System.out.println("NextToken: " + doMetaQueryResult.getNextToken());
            }
        } catch (OSSException oe) {
            System.out.println("Error Message:" + oe.getErrorMessage());
            System.out.println("Error Code:" + oe.getErrorCode());
            System.out.println("Request ID:" + oe.getRequestId());
            System.out.println("Host ID:" + oe.getHostId());
        } catch (ClientException ce) {
            System.out.println("Error Message: " + ce.getMessage());
        } finally {
            // Shut down the OSSClient instance. 
            ossClient.shutdown();
        }
    }
# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
from oss2.models import MetaQuery, AggregationsRequest
# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())

# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com. 
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Specify the ID of the region that maps to the endpoint. Example: cn-hangzhou. This parameter is required if you use the signature algorithm V4.
region = "cn-hangzhou"

# Specify the name of the bucket. Example: examplebucket. 
bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)

# Query objects that meet specific conditions and list the object information based on specific fields and sorting methods. 
# Query objects that are smaller than 1 MB, return up to 10 objects at a time, and sort the objects in ascending order. 
do_meta_query_request = MetaQuery(max_results=10, query='{"Field": "Size","Value": "1048576","Operation": "lt"}', sort='Size', order='asc')
result = bucket.do_bucket_meta_query(do_meta_query_request)

# Display the object names. 
print(result.files[0].file_name)
# Display the ETags of the objects. 
print(result.files[0].etag)
# Display the types of the objects. 
print(result.files[0].oss_object_type)
# Display the storage classes of the objects. 
print(result.files[0].oss_storage_class)
# Display the CRC-64 values of the objects. 
print(result.files[0].oss_crc64)
# Display the access control lists (ACLs) of the objects. 
print(result.files[0].object_acl)
package main

import (
    "fmt"
    "github.com/aliyun/aliyun-oss-go-sdk/oss"
    "os"
)
func main()  {
    // Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
    provider, err := oss.NewEnvironmentVariableCredentialsProvider()
    if err != nil {
        fmt.Println("Error:", err)
        os.Exit(-1)
    }

    // Create an OSSClient instance. 
    // Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com. Specify your actual endpoint. 
    client, err := oss.New("yourEndpoint", "", "", oss.SetCredentialsProvider(&provider))
    if err != nil {
        fmt.Println("Error:", err)
        os.Exit(-1)
    }    
    // Query objects that are larger than 30 bytes in size, return up to 10 objects at the same time, and then sort the objects in ascending order. 
    query := oss.MetaQuery{
        NextToken: "",
        MaxResults: 10,
        Query: `{"Field": "Size","Value": "30","Operation": "gt"}`,
        Sort: "Size",
        Order: "asc",
    }
    // Query objects that match the specified conditions and list object information based on the specified fields and sorting methods. 
    result,err := client.DoMetaQuery("examplebucket",query)
    if err != nil {
        fmt.Println("Error:", err)
        os.Exit(-1)
    }
    fmt.Printf("NextToken:%s\n", result.NextToken)
    for _, file := range result.Files {
        fmt.Printf("File name: %s\n", file.Filename)
        fmt.Printf("size: %d\n", file.Size)
        fmt.Printf("File Modified Time:%s\n", file.FileModifiedTime)
        fmt.Printf("Oss Object Type:%s\n", file.OssObjectType)
        fmt.Printf("Oss Storage Class:%s\n", file.OssStorageClass)
        fmt.Printf("Object ACL:%s\n", file.ObjectACL)
        fmt.Printf("ETag:%s\n", file.ETag)
        fmt.Printf("Oss CRC64:%s\n", file.OssCRC64)
        fmt.Printf("Oss Tagging Count:%d\n", file.OssTaggingCount)
        for _, tagging := range  file.OssTagging {
            fmt.Printf("Oss Tagging Key:%s\n", tagging.Key)
            fmt.Printf("Oss Tagging Value:%s\n", tagging.Value)
        }
        for _, userMeta := range  file.OssUserMeta {
            fmt.Printf("Oss User Meta Key:%s\n", userMeta.Key)
            fmt.Printf("Oss User Meta Key Value:%s\n", userMeta.Value)
        }
    }
}

Use the OSS API

If your business requires a high level of customization, you can directly call RESTful APIs. To directly call an API, you must include the signature calculation in your code. For more information, see DoMetaQuery.

Search conditions and search result settings

Search conditions

The following table describes all search conditions. You can specify one or more search conditions based on your business requirements.

OSS metadata conditions

Search condition

Description

Storage Class

By default, the following storage classes supported by OSS are selected: Standard, Infrequent Access (IA), Archive, Cold Archive, and Deep Cold Archive. You can specify the storage class based on your business requirements.

ACL

By default, the following ACLs supported by OSS are selected: Inherited from Bucket, Private, Public Read, and Public Read/Write. You can specify the ACL based on your business requirements.

Object Name

You can select Fuzzy Match or Equal To. If you want to display the name of an object in the search results, such as exampleobject.txt, you can use one of the following methods to match the object name:

  • Select Equal To and enter the full name of the object. Example: exampleobject.txt.

  • Select Fuzzy Match and enter the prefix or suffix of the object name. Example: example or .txt.

    Important

    Fuzzy match can match all object names that contain specific characters. For example, if you enter test next to Fuzzy Match, localfolder/test/.example.jpg and localfolder/test.jpg meet the search condition, and are displayed in the search results.

Upload Type

By default, the following upload types are selected. You can specify the upload type based on your business requirements.

  • Normal: returns objects uploaded by using simple upload in the search results.

  • Multipart: returns objects uploaded by using multipart upload in the search results.

  • Appendable: returns objects uploaded by using append upload in the search results.

  • Symlink: returns symbolic links.

Last Modified At

You can specify Start Date and End Date for Last Modified At. The values of Start Date and End Date are accurate to seconds.

Object Size

You can select Equal To, Greater Than, Greater Than or Equal To, Less Than, or Less Than or Equal To for Object Size. Unit: KB.

Object Versions

You can search for only the current versions of objects.

Object ETag and tag conditions

If you want to search for objects based on their ETags and tags, you can enter the ETags or tags of the objects that you want to display in the search results.

  • ETags support only exact match. An ETag must be enclosed in quotation marks (“). Example: "5B3C1A2E0563E1B002CC607C6689". If you want to specify multiple ETags, separate them with line feeds.

  • Specify Object Tags by using key-value pairs. The keys and values of object tags are case-sensitive. For more information about tag rules, see Add tags to an object.

Search result settings

You can sort the search results and view statistics on search results based on specific conditions.

  • Object Sort Order: You can sort the search results in the Ascending, Descending, or Default order based on the Last Modified Time, Object Name, and Object Size based on your business requirements.

  • Data Aggregation: You can view statistics on the search results based on specific conditions, such as de-duplication, group count, maximum, minimum, average, and sum. This facilitates efficient data analysis and management.

FAQ

When hundreds of millions of objects are stored in a bucket, why are data indexes not created for a long period of time?

Approximately 1 second is required to create indexes for 600 objects. You can estimate the period of time required to create indexes based on the number of objects in the bucket.

References

MetaSearch supports multiple filtering conditions, such as the last modified time, storage class, ACL, and size of objects. If you want to search for OSS objects whose last modified time is within a specific period of time from a large number of objects in a bucket, see How to filter OSS objects whose last modified time is within a specific period of time.