You can use the data indexing feature to quickly find objects in a bucket that match specific conditions. These conditions can include the object name, ETag, storage class, size, and last modified time. The data indexing feature lets you specify filter conditions when you search for objects. You can also sort and aggregate the query results as needed. This improves the efficiency of finding your target objects.
Usage notes
Only Python SDK V2.16.0 and later support the data indexing feature.
Only buckets in the China (Hangzhou) region support the data indexing feature. For more information, see Data indexing.
In this topic, the public endpoint of the China (Hangzhou) region is used. If you want to access OSS from other Alibaba Cloud services in the same region as OSS, use an internal endpoint. For more information about OSS regions and endpoints, see Regions and endpoints.
In this topic, access credentials are obtained from environment variables. For more information about how to configure access credentials, see Configure access credentials using OSS SDK for Python 1.0.
In this topic, an OSSClient instance is created by using an OSS endpoint. If you want to create an OSSClient instance by using custom domain names or Security Token Service (STS), see Initialization.
Enable data indexing
The following code shows how to enable data indexing for a bucket. After you enable this feature, OSS creates a metadata index for the bucket and builds metadata indexes for all objects in it. After the index is created, OSS performs Near Real-Time incremental scans on new files in the bucket and builds their metadata indexes.
# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
# Obtain access credentials from environment variables. Before you run this sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are set.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Set the endpoint to the one that corresponds to the region where the bucket is located. For example, if the bucket is in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Set the region to the one that corresponds to the endpoint, such as cn-hangzhou. Note: This parameter is required for V4 signatures.
region = "cn-hangzhou"
# Replace examplebucket with the actual bucket name.
bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)
# Enable data indexing.
bucket.open_bucket_meta_query()Get data indexing status
The following code shows how to retrieve the data indexing status for a specific bucket.
# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
# Obtain access credentials from environment variables. Before you run this sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are set.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Set the endpoint to the one that corresponds to the region where the bucket is located. For example, if the bucket is in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Set the region to the one that corresponds to the endpoint, such as cn-hangzhou. Note: This parameter is required for V4 signatures.
region = "cn-hangzhou"
# Replace examplebucket with the actual bucket name.
bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)
# Get the data indexing status for the specified bucket.
get_result = bucket.get_bucket_meta_query_status()
# Print the status.
print(get_result.state)Query for objects that meet specific conditions
The following code shows how to query for objects that meet specific conditions and list the object information based on a specified field and sort order.
# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
from oss2.models import MetaQuery, AggregationsRequest
# Obtain access credentials from environment variables. Before you run this sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are set.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Set the endpoint to the one that corresponds to the region where the bucket is located. For example, if the bucket is in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Set the region to the one that corresponds to the endpoint, such as cn-hangzhou. Note: This parameter is required for V4 signatures.
region = "cn-hangzhou"
# Replace examplebucket with the actual bucket name.
bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)
# Query for objects that meet specific conditions, and list the object information based on a specified field and sort order.
# Query for files that are smaller than 1 MB, return a maximum of 10 results, and sort the results in ascending order.
do_meta_query_request = MetaQuery(max_results=10, query='{"Field": "Size","Value": "1048576","Operation": "lt"}', sort='Size', order='asc')
result = bucket.do_bucket_meta_query(do_meta_query_request)
# Print the object name.
print(result.files[0].file_name)
# Print the ETag of the object.
print(result.files[0].etag)
# Print the object type.
print(result.files[0].oss_object_type)
# Print the object storage class.
print(result.files[0].oss_storage_class)
# Print the 64-bit CRC value of the object.
print(result.files[0].oss_crc64)
# Print the access permissions of the object.
print(result.files[0].object_acl)Disable the Metadata Management Feature
The following code shows how to disable the data indexing feature for a specific bucket.
# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
# Obtain access credentials from environment variables. Before you run this sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are set.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Set the endpoint to the one that corresponds to the region where the bucket is located. For example, if the bucket is in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Set the region to the one that corresponds to the endpoint, such as cn-hangzhou. Note: This parameter is required for V4 signatures.
region = "cn-hangzhou"
# Replace examplebucket with the actual bucket name.
bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)
# Disable the data indexing feature for the specified bucket.
bucket.close_bucket_meta_query()References
For more information about the API operation to enable data indexing, see OpenMetaQuery.
For more information about the API operation to retrieve the status of data indexing, see GetMetaQueryStatus.
For more information about the API operation to query for objects that meet specific conditions and list the object information based on a specified field and sort order, see DoMetaQuery.
For more information about the API operation to disable data indexing, see CloseMetaQuery.