Object Storage Service:Use MetaSearch to search for OSS objects based on metadata attributes
Last Updated:Nov 04, 2024
Provided by Object Storage Service (OSS), Metasearch is an indexing feature based on the metadata of objects, which can be specified as index conditions to query objects. This way, you can manage and learn about data structures, perform queries, collect statistics, and manage objects in an efficient manner.
Scenarios
Data audit
You can use MetaSearch to quickly find objects to meet data audit or regulatory requirements. For example, in the financial industry, you can filter objects by using metadata such as custom tags and access control lists (ACLs). This way, you can search for objects that have a specific sensitivity level or specific ACLs to improve the efficiency of data audits.
Enterprise data backup and archiving
When an enterprise wants to back up and archive data, the enterprise can use MetaSearch to quickly search for objects of a specific creation date or storage class based on object metadata such as the creation date, storage class, and custom tags. This way, the enterprise can quickly restore historical data or archived objects.
Usage notes
Supported regions
MetaSearch is supported for buckets that are located in the China (Hangzhou), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Guangzhou), China (Hong Kong), and Singapore regions.
Object quantity
By default, MetaSearch is supported only for a bucket that contains up to 10 billion objects.
Billing rules
MetaSearch is in public preview, during which you can use MetaSearch free of charge. To use the MetaSearch feature, you must enable the metadata management feature. After the public preview, you will be charged for metadata management and metadata retrievals. For more information about the billable items of the data indexing feature, see Data indexing fees.
Time required for indexing
After you enable MetaSearch, OSS creates an index. The time required to create the index is proportional to the number of objects stored in the bucket. If a larger number of objects are stored in the bucket, a longer period of time is required to create the index. In most cases, the first time you create an index for 10 million objects, approximately 1 hour is required. The first time you create an index for 1 billion objects, approximately 1 day is required. The first time you create an index for 10 billion objects, approximately 2 to 3 days are required. The preceding time is provided only for reference.
Multipart upload
If a bucket contains objects that are uploaded by using multipart upload, the search results include only the complete objects combined by calling the CompleteMultipartUpload operation. Parts that are uploaded by multipart upload tasks that are initiated but are not completed or canceled are not included in the search results.
Methods
Use the OSS console
In this example, the following search conditions are used to search for objects: 1. Object size: less than 500 KB; 2. Last modified time: from 00:00:00 on September 11, 2024 to 00:00:00 on September 12, 2024; 3. Sort order: sort objects by object size in the ascending order; 4. Data aggregation: display the maximum size of the objects that meet the preceding requirements.
Currently, only OSS SDK for Java, OSS SDK for Python, and OSS SDK for Go allow you to use MetaSearch to query objects that meet specific conditions. Before you use MetaSearch to search for objects in a bucket, you must enable the metadata management feature for the bucket. For more information about how to use MetaSearch to search for objects by using OSS SDKs for other programming languages, see Overview.
import com.aliyun.oss.ClientException;
import com.aliyun.oss.OSS;
import com.aliyun.oss.common.auth.*;
import com.aliyun.oss.OSSClientBuilder;
import com.aliyun.oss.OSSException;
import com.aliyun.oss.model.*;
import java.util.ArrayList;
import java.util.List;
public class Demo {
// In this example, the endpoint of the China (Hangzhou) region is used. Specify your actual endpoint.
private static String endpoint = "https://oss-cn-hangzhou.aliyuncs.com";
// Specify the name of the bucket. Example: examplebucket.
private static String bucketName = "examplebucket";
public static void main(String[] args) throws Exception {
// Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
EnvironmentVariableCredentialsProvider credentialsProvider = CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
// Create an OSSClient instance.
OSS ossClient = new OSSClientBuilder().build(endpoint, credentialsProvider);
try {
// Query objects that meet specific conditions and list information about the objects based on specific fields and sorting methods.
int maxResults = 20;
// Query objects that are smaller than 1,048,576 bytes in size, return up to 20 objects at a time, and sort the objects in ascending order.
String query = "{\"Field\": \"Size\",\"Value\": \"1048576\",\"Operation\": \"lt\"}";
String sort = "Size";
DoMetaQueryRequest doMetaQueryRequest = new DoMetaQueryRequest(bucketName, maxResults, query, sort);
Aggregation aggregationRequest = new Aggregation();
Aggregations aggregations = new Aggregations();
List<Aggregation> aggregationList = new ArrayList<Aggregation>();
// Specify the name of the field that is used in the aggregate operation.
aggregationRequest.setField("Size");
// Specify the operator that is used in the aggregate operation. max indicates the maximum value.
aggregationRequest.setOperation("max");
aggregationList.add(aggregationRequest);
aggregations.setAggregation(aggregationList);
// Specify the aggregate operation.
doMetaQueryRequest.setAggregations(aggregations);
doMetaQueryRequest.setOrder(SortOrder.ASC);
DoMetaQueryResult doMetaQueryResult = ossClient.doMetaQuery(doMetaQueryRequest);
if(doMetaQueryResult.getFiles() != null){
for(ObjectFile file : doMetaQueryResult.getFiles().getFile()){
System.out.println("Filename: " + file.getFilename());
// Query the ETag values that are used to identify the content of the objects.
System.out.println("ETag: " + file.getETag());
// Query the access control list (ACL) of the objects.
System.out.println("ObjectACL: " + file.getObjectACL());
// Query the type of the objects.
System.out.println("OssObjectType: " + file.getOssObjectType());
// Query the storage class of the objects.
System.out.println("OssStorageClass: " + file.getOssStorageClass());
// Query the number of tags of the objects.
System.out.println("TaggingCount: " + file.getOssTaggingCount());
if(file.getOssTagging() != null){
for(Tagging tag : file.getOssTagging().getTagging()){
System.out.println("Key: " + tag.getKey());
System.out.println("Value: " + tag.getValue());
}
}
if(file.getOssUserMeta() != null){
for(UserMeta meta : file.getOssUserMeta().getUserMeta()){
System.out.println("Key: " + meta.getKey());
System.out.println("Value: " + meta.getValue());
}
}
}
} else if(doMetaQueryResult.getAggregations() != null){
for(Aggregation aggre : doMetaQueryResult.getAggregations().getAggregation()){
// Query the name of the aggregation field.
System.out.println("Field: " + aggre.getField());
// Query the aggregation operator.
System.out.println("Operation: " + aggre.getOperation());
// Query the values of the aggregate operations.
System.out.println("Value: " + aggre.getValue());
if(aggre.getGroups() != null && aggre.getGroups().getGroup().size() > 0){
// Query the values of the aggregation operations by group.
System.out.println("Groups value: " + aggre.getGroups().getGroup().get(0).getValue());
// Query the total number of the aggregation operations by group.
System.out.println("Groups count: " + aggre.getGroups().getGroup().get(0).getCount());
}
}
} else {
System.out.println("NextToken: " + doMetaQueryResult.getNextToken());
}
} catch (OSSException oe) {
System.out.println("Error Message:" + oe.getErrorMessage());
System.out.println("Error Code:" + oe.getErrorCode());
System.out.println("Request ID:" + oe.getRequestId());
System.out.println("Host ID:" + oe.getHostId());
} catch (ClientException ce) {
System.out.println("Error Message: " + ce.getMessage());
} finally {
// Shut down the OSSClient instance.
ossClient.shutdown();
}
}
# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
from oss2.models import MetaQuery, AggregationsRequest
# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Specify the ID of the region that maps to the endpoint. Example: cn-hangzhou. This parameter is required if you use the signature algorithm V4.
region = "cn-hangzhou"
# Specify the name of the bucket. Example: examplebucket.
bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)
# Query objects that meet specific conditions and list the object information based on specific fields and sorting methods.
# Query objects that are smaller than 1 MB, return up to 10 objects at a time, and sort the objects in ascending order.
do_meta_query_request = MetaQuery(max_results=10, query='{"Field": "Size","Value": "1048576","Operation": "lt"}', sort='Size', order='asc')
result = bucket.do_bucket_meta_query(do_meta_query_request)
# Display the object names.
print(result.files[0].file_name)
# Display the ETags of the objects.
print(result.files[0].etag)
# Display the types of the objects.
print(result.files[0].oss_object_type)
# Display the storage classes of the objects.
print(result.files[0].oss_storage_class)
# Display the CRC-64 values of the objects.
print(result.files[0].oss_crc64)
# Display the access control lists (ACLs) of the objects.
print(result.files[0].object_acl)
package main
import (
"fmt"
"github.com/aliyun/aliyun-oss-go-sdk/oss"
"os"
)
func main() {
// Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
provider, err := oss.NewEnvironmentVariableCredentialsProvider()
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Create an OSSClient instance.
// Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com. Specify your actual endpoint.
client, err := oss.New("yourEndpoint", "", "", oss.SetCredentialsProvider(&provider))
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Query objects that are larger than 30 bytes in size, return up to 10 objects at the same time, and then sort the objects in ascending order.
query := oss.MetaQuery{
NextToken: "",
MaxResults: 10,
Query: `{"Field": "Size","Value": "30","Operation": "gt"}`,
Sort: "Size",
Order: "asc",
}
// Query objects that match the specified conditions and list object information based on the specified fields and sorting methods.
result,err := client.DoMetaQuery("examplebucket",query)
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
fmt.Printf("NextToken:%s\n", result.NextToken)
for _, file := range result.Files {
fmt.Printf("File name: %s\n", file.Filename)
fmt.Printf("size: %d\n", file.Size)
fmt.Printf("File Modified Time:%s\n", file.FileModifiedTime)
fmt.Printf("Oss Object Type:%s\n", file.OssObjectType)
fmt.Printf("Oss Storage Class:%s\n", file.OssStorageClass)
fmt.Printf("Object ACL:%s\n", file.ObjectACL)
fmt.Printf("ETag:%s\n", file.ETag)
fmt.Printf("Oss CRC64:%s\n", file.OssCRC64)
fmt.Printf("Oss Tagging Count:%d\n", file.OssTaggingCount)
for _, tagging := range file.OssTagging {
fmt.Printf("Oss Tagging Key:%s\n", tagging.Key)
fmt.Printf("Oss Tagging Value:%s\n", tagging.Value)
}
for _, userMeta := range file.OssUserMeta {
fmt.Printf("Oss User Meta Key:%s\n", userMeta.Key)
fmt.Printf("Oss User Meta Key Value:%s\n", userMeta.Value)
}
}
}
Use the OSS API
If your business requires a high level of customization, you can directly call RESTful APIs. To directly call an API, you must include the signature calculation in your code. For more information, see DoMetaQuery.
Search conditions and search result settings
Search conditions
The following table describes all search conditions. You can specify one or more search conditions based on your business requirements.
OSS metadata conditions
Search condition
Description
Storage Class
By default, the following storage classes supported by OSS are selected: Standard, Infrequent Access (IA), Archive, Cold Archive, and Deep Cold Archive. You can specify the storage class based on your business requirements.
ACL
By default, the following ACLs supported by OSS are selected: Inherited from Bucket, Private, Public Read, and Public Read/Write. You can specify the ACL based on your business requirements.
Object Name
You can select Fuzzy Match or Equal To. If you want to display the name of an object in the search results, such as exampleobject.txt, you can use one of the following methods to match the object name:
Select Equal To and enter the full name of the object. Example: exampleobject.txt.
Select Fuzzy Match and enter the prefix or suffix of the object name. Example: example or .txt.
Important
Fuzzy match can match all object names that contain specific characters. For example, if you enter test next to Fuzzy Match, localfolder/test/.example.jpg and localfolder/test.jpg meet the search condition, and are displayed in the search results.
Upload Type
By default, the following upload types are selected. You can specify the upload type based on your business requirements.
Normal: returns objects uploaded by using simple upload in the search results.
Multipart: returns objects uploaded by using multipart upload in the search results.
Appendable: returns objects uploaded by using append upload in the search results.
Symlink: returns symbolic links.
Last Modified At
You can specify Start Date and End Date for Last Modified At. The values of Start Date and End Date are accurate to seconds.
Object Size
You can select Equal To, Greater Than, Greater Than or Equal To, Less Than, or Less Than or Equal To for Object Size. Unit: KB.
Object Versions
You can search for only the current versions of objects.
Object ETag and tag conditions
If you want to search for objects based on their ETags and tags, you can enter the ETags or tags of the objects that you want to display in the search results.
ETags support only exact match. An ETag must be enclosed in quotation marks (“). Example: "5B3C1A2E0563E1B002CC607C6689". If you want to specify multiple ETags, separate them with line feeds.
Specify Object Tags by using key-value pairs. The keys and values of object tags are case-sensitive. For more information about tag rules, see Add tags to an object.
Search result settings
You can sort the search results and view statistics on search results based on specific conditions.
Object Sort Order: You can sort the search results in the Ascending, Descending, or Default order based on the Last Modified Time, Object Name, and Object Size based on your business requirements.
Data Aggregation: You can view statistics on the search results based on specific conditions, such as de-duplication, group count, maximum, minimum, average, and sum. This facilitates efficient data analysis and management.
FAQ
When hundreds of millions of objects are stored in a bucket, why are data indexes not created for a long period of time?
Approximately 1 second is required to create indexes for 600 objects. You can estimate the period of time required to create indexes based on the number of objects in the bucket.
References
MetaSearch supports multiple filtering conditions, such as the last modified time, storage class, ACL, and size of objects. If you want to search for OSS objects whose last modified time is within a specific period of time from a large number of objects in a bucket, see How to filter OSS objects whose last modified time is within a specific period of time.