All Products
Search
Document Center

Object Storage Service:Data indexing

Last Updated:Nov 01, 2024

Object Storage Service (OSS) provides the data indexing feature to allow you to query objects that match specific metadata conditions, such as the name, ETag, storage class, size, and last modified time of objects. The data indexing feature sorts and aggregates the query results based on your business requirements. This improves the efficiency of querying specific objects from a large number of objects.

Usage notes

  • Only OSS SDK for Python 2.1.6.0 and later support the data indexing feature.

  • The data indexing feature is supported only by buckets that are located in the China (Hangzhou) and Australia (Sydney, closing down) regions. For more information, see MetaSearch.

  • In this topic, the public endpoint of the China (Hangzhou) region is used. If you want to access OSS from other Alibaba Cloud services in the same region as OSS, use an internal endpoint. For more information about OSS regions and endpoints, see Regions, endpoints and open ports.

  • In this topic, access credentials are obtained from environment variables. For more information about how to configure access credentials, see Configure access credentials.

  • In this topic, an OSSClient instance is created by using an OSS endpoint. If you want to create an OSSClient instance by using custom domain names or Security Token Service (STS), see Initialization.

Enable the metadata management feature for a bucket

The following sample code provides an example on how to enable the metadata management feature for a bucket: After you enable the metadata management feature for a bucket, OSS creates a metadata index library for the bucket and creates metadata indexes for all objects in the bucket. After the metadata index library is created, OSS continues to perform quasi-real-time scans on incremental objects in the bucket and creates metadata indexes for the incremental objects.

# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider

# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())

# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com. 
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Specify the ID of the region that maps to the endpoint. Example: cn-hangzhou. This parameter is required if you use the signature algorithm V4.
region = "cn-hangzhou"

# Specify the name of the bucket. Example: examplebucket. 
bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)

# Enable the metadata management feature for the bucket. 
bucket.open_bucket_meta_query()

Query the metadata index library of a bucket

The following code provides an example on how to query the metadata index library of a bucket:

# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider

# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())

# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com. 
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Specify the ID of the region that maps to the endpoint. Example: cn-hangzhou. This parameter is required if you use the signature algorithm V4.
region = "cn-hangzhou"

# Specify the name of the bucket. Example: examplebucket. 
bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)

# Query the metadata index library information of a specified bucket. 
get_result = bucket.get_bucket_meta_query_status()

# Display the state. 
print(get_result.state)

Query objects that meet specific conditions

The following sample code provides an example on how to query objects that meet specific conditions and list the object information based on specific fields and sorting methods:

# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
from oss2.models import MetaQuery, AggregationsRequest
# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())

# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com. 
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Specify the ID of the region that maps to the endpoint. Example: cn-hangzhou. This parameter is required if you use the signature algorithm V4.
region = "cn-hangzhou"

# Specify the name of the bucket. Example: examplebucket. 
bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)

# Query objects that meet specific conditions and list the object information based on specific fields and sorting methods. 
# Query objects that are smaller than 1 MB, return up to 10 objects at a time, and sort the objects in ascending order. 
do_meta_query_request = MetaQuery(max_results=10, query='{"Field": "Size","Value": "1048576","Operation": "lt"}', sort='Size', order='asc')
result = bucket.do_bucket_meta_query(do_meta_query_request)

# Display the object names. 
print(result.files[0].file_name)
# Display the ETags of the objects. 
print(result.files[0].etag)
# Display the types of the objects. 
print(result.files[0].oss_object_type)
# Display the storage classes of the objects. 
print(result.files[0].oss_storage_class)
# Display the CRC-64 values of the objects. 
print(result.files[0].oss_crc64)
# Display the access control lists (ACLs) of the objects. 
print(result.files[0].object_acl)

Disable the metadata management feature for a bucket

The following code provides an example on how to disable the metadata management feature for a specified bucket:

# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())

# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com. 
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Specify the ID of the region that maps to the endpoint. Example: cn-hangzhou. This parameter is required if you use the signature algorithm V4.
region = "cn-hangzhou"

# Specify the name of the bucket. Example: examplebucket. 
bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)

# Disable the metadata management feature for the bucket. 
bucket.close_bucket_meta_query()

References

  • For more information about the API operation that you can call to enable the metadata management feature, see OpenMetaQuery.

  • For more information about the API operation that you can call to query information about a metadata index library, see GetMetaQueryStatus.

  • For more information about the API operation that you can call to query objects that meet specific conditions and list the object information based on specific fields and sorting methods, see DoMetaQuery.

  • For more information about the API operation that you can call to disable the metadata management feature, see CloseMetaQuery.