MetaSearchを使用してメタデータ属性に基づいてOSSオブジェクトを検索する - Object Storage Service

Object Storage Service (OSS) が提供するMetasearchは、オブジェクトのメタデータに基づくインデックス作成機能であり、オブジェクトをクエリするためのインデックス条件として指定できます。これにより、データ構造の管理と学習、クエリの実行、統計の収集、オブジェクトの管理を効率的に行うことができます。

シナリオ

データ監査

MetaSearchを使用すると、データ監査または規制要件を満たすオブジェクトをすばやく見つけることができます。たとえば、金融業界では、カスタムタグやアクセス制御リスト (ACL) などのメタデータを使用してオブジェクトをフィルタリングできます。これにより、特定の機密レベルまたは特定のACLを持つオブジェクトを検索して、データ監査の効率を向上させることができます。

エンタープライズデータのバックアップとアーカイブ

企業がデータをバックアップしてアーカイブする場合、企業はMetaSearchを使用して、作成日、ストレージクラス、カスタムタグなどのオブジェクトメタデータに基づいて、特定の作成日またはストレージクラスのオブジェクトをすばやく検索できます。これにより、企業は履歴データやアーカイブされたオブジェクトをすばやく復元できます。

使用上の注意

サポートされるリージョン
MetaSearchは、中国 (杭州) 、中国 (上海) 、中国 (青島) 、中国 (北京) 、中国 (張家口) 、中国 (深セン) 、中国 (広州) 、中国 (成都) 、中国 (香港) 、シンガポール、インドネシア (ジャカルタ) 、ドイツ (フランクフルト) 、米国 (バージニア) 、米国 (シリコンバレー) 、英国 (ロンドン)リージョン。
オブジェクト量
既定では、MetaSearchは最大100億個のオブジェクトを含むバケットに対してのみサポートされます。
課金ルール
MetaSearchはパブリックプレビューで、MetaSearchを無料で使用できます。 MetaSearch機能を使用するには、メタデータ管理機能を有効にする必要があります。パブリックプレビューの後、メタデータの管理とメタデータの取得に対して課金されます。データインデックス作成機能の課金対象項目の詳細については、「データインデックス作成料金」をご参照ください。
インデックス作成に必要な時間
MetaSearchを有効にすると、OSSはインデックスを作成します。インデックスの作成に必要な時間は、バケットに保存されているオブジェクトの数に比例します。バケットに格納されるオブジェクトの数が多い場合、インデックスの作成に時間がかかります。ほとんどの場合、1,000万個のオブジェクトのインデックスを初めて作成するときは、約1時間が必要です。 10億個のオブジェクトのインデックスを初めて作成するときは、約1日が必要です。 10億個のオブジェクトのインデックスを初めて作成するときは、約2〜3日が必要です。前の時間は参照のためだけに提供されます。
マルチパートアップロード
マルチパートアップロードを使用してアップロードされたオブジェクトがバケットに含まれている場合、検索結果には、CompleteMultipartUpload操作を呼び出して結合された完全なオブジェクトのみが含まれます。開始されたが完了またはキャンセルされていないマルチパートアップロードタスクによってアップロードされたパーツは、検索結果に含まれません。

方法

OSSコンソールの使用

この例では、以下の検索条件を用いてオブジェクトを検索する。 1. オブジェクトサイズ: 500 KB未満。最終変更時刻: 2024年9月11日の00:00:00から2024年9月12日の00:00:00まで。ソート順序: オブジェクトをオブジェクトサイズで昇順にソートする。データ集約: 上記の要件を満たすオブジェクトの最大サイズを表示します。

中国 (広州) リージョンのバケツ

OSSコンソールにログインします。
左側のナビゲーションウィンドウで、バケットリスト をクリックします。 [バケット] ページで、目的のバケットを見つけてクリックします。
左側のナビゲーションツリーで、ファイル > データのインデックス作成 を選択します。
データインデックス作成ページで、今すぐ有効にするをクリックします。
[データインデックス] ダイアログボックスで、メタサーチを選択し、有効化をクリックします。
説明
MetaSearchが有効になるまでに必要な時間は、バケット内のオブジェクトの数によって異なります。
[OSSメタデータ条件] セクションでビジネス要件に基づいてパラメーターを指定し、他のパラメーターのデフォルト設定を保持します。
- Last Modified Atパラメーターの開始日を2024 9月11日の00:00:00に設定し、終了日を2024 9月12日の00:00:00に設定します。
- ドロップダウンリストから [未満] を選択し、[オブジェクトサイズ] パラメーターの2番目のフィールドに500を入力します。
[検索結果の設定] セクションで検索結果の表示方法を指定します。
- [オブジェクトの並べ替え順序] を [昇順] に設定し、[並べ替え] ドロップダウンリストから [オブジェクトサイズ] を選択します。
- [データ集約] パラメーターの [出力] ドロップダウンリストから [オブジェクトサイズ] を選択し、[分割] ドロップダウンリストから [最大値] を選択します。
[今すぐ照会] をクリックします。

詳細については、「検索条件と検索結果の設定」をご参照ください。

中国 (杭州) 、中国 (上海) 、中国 (青島) 、中国 (北京) 、中国 (張家口) 、中国 (深セン) 、中国 (成都) 、中国 (香港) 、シンガポール、インドネシア (ジャカルタ) 、ドイツ (フランクフルト) 、米国 (バージニア) 、米国 (シリコンバレー) 、英国 (ロンドン) の各リージョン

OSSコンソールへのログイン
左側のナビゲーションウィンドウで、バケットリスト をクリックします。 [バケット] ページで、目的のバケットを見つけてクリックします。
左側のナビゲーションツリーで、ファイル > データのインデックス作成 を選択します。
メタデータ管理を有効にします。
説明
MetaSearchが有効になるまでに必要な時間は、バケット内のオブジェクトの数によって異なります。
次の手順に従って、基本フィルタリング条件を設定します。他のパラメーターのデフォルト設定を保持します。
- [Last Modified At] パラメーターの開始時刻を2024 10月20日の00:00:00に設定し、終了時刻を10月21日の00:00:00に2024します。
- ドロップダウンリストから [未満] を選択し、[オブジェクトサイズ] パラメーターの2番目のフィールドに1600を入力します。
もっとフィルタリング条件を表示する
- [オブジェクトの並べ替え順序] を [昇順] に設定し、[並べ替え] ドロップダウンリストから [オブジェクトサイズ] を選択します。
- [データ集約] パラメーターの [出力] ドロップダウンリストから [オブジェクトサイズ] を選択し、[分割] ドロップダウンリストから [最大値] を選択します。
次の図に示すように、検索条件を満たす2つのオブジェクトが返されます。オブジェクトの最大サイズは1.54 MBです。
検索条件と検索結果の設定の詳細については、「検索条件と検索結果の設定」をご参照ください。

OSS SDKの使用

現在、MetaSearchを使用して特定の条件を満たすオブジェクトをクエリできるのは、OSS SDK for Java、OSS SDK for Python、およびOSS SDK for Goのみです。 MetaSearchを使用してバケット内のオブジェクトを検索する前に、バケットのメタデータ管理機能を有効にする必要があります。 MetaSearchを使用して他のプログラミング言語のOSS SDKを使用してオブジェクトを検索する方法の詳細については、「概要」をご参照ください。

Java

import com.aliyun.oss.ClientException;
import com.aliyun.oss.OSS;
import com.aliyun.oss.common.auth.*;
import com.aliyun.oss.OSSClientBuilder;
import com.aliyun.oss.OSSException;
import com.aliyun.oss.model.*;
import java.util.ArrayList;
import java.util.List;

public class Demo {

    // In this example, the endpoint of the China (Hangzhou) region is used. Specify your actual endpoint. 
    private static String endpoint = "https://oss-cn-hangzhou.aliyuncs.com";
    // Specify the name of the bucket. Example: examplebucket. 
    private static String bucketName = "examplebucket";

    public static void main(String[] args) throws Exception {
        // Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
        EnvironmentVariableCredentialsProvider credentialsProvider = CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
        // Specify the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the region to cn-hangzhou.
        String region = "cn-hangzhou";

        // Create an OSSClient instance. 
        ClientBuilderConfiguration clientBuilderConfiguration = new ClientBuilderConfiguration();
        clientBuilderConfiguration.setSignatureVersion(SignVersion.V4);        
        OSS ossClient = OSSClientBuilder.create()
        .endpoint(endpoint)
        .credentialsProvider(credentialsProvider)
        .clientConfiguration(clientBuilderConfiguration)
        .region(region)               
        .build();

        try {
            // Query objects that meet specific conditions and list information about the objects based on specific fields and sorting methods. 
            int maxResults = 20;
            // Query objects that are smaller than 1,048,576 bytes in size, return up to 20 objects at a time, and sort the objects in ascending order. 
            String query = "{\"Field\": \"Size\",\"Value\": \"1048576\",\"Operation\": \"lt\"}";
            String sort = "Size";
            DoMetaQueryRequest doMetaQueryRequest = new DoMetaQueryRequest(bucketName, maxResults, query, sort);
            Aggregation aggregationRequest = new Aggregation();
            Aggregations aggregations = new Aggregations();
            List<Aggregation> aggregationList = new ArrayList<Aggregation>();
            // Specify the name of the field that is used in the aggregate operation. 
            aggregationRequest.setField("Size");
            // Specify the operator that is used in the aggregate operation. max indicates the maximum value. 
            aggregationRequest.setOperation("max");
            aggregationList.add(aggregationRequest);
            aggregations.setAggregation(aggregationList);

            // Specify the aggregate operation. 
            doMetaQueryRequest.setAggregations(aggregations);
            doMetaQueryRequest.setOrder(SortOrder.ASC);
            DoMetaQueryResult doMetaQueryResult = ossClient.doMetaQuery(doMetaQueryRequest);
            if(doMetaQueryResult.getFiles() != null){
                for(ObjectFile file : doMetaQueryResult.getFiles().getFile()){
                    System.out.println("Filename: " + file.getFilename());
                    // Query the ETag values that are used to identify the content of the objects. 
                    System.out.println("ETag: " + file.getETag());
                    // Query the access control list (ACL) of the objects.
                    System.out.println("ObjectACL: " + file.getObjectACL());
                    // Query the type of the objects. 
                    System.out.println("OssObjectType: " + file.getOssObjectType());
                    // Query the storage class of the objects. 
                    System.out.println("OssStorageClass: " + file.getOssStorageClass());
                    // Query the number of tags of the objects. 
                    System.out.println("TaggingCount: " + file.getOssTaggingCount());
                    if(file.getOssTagging() != null){
                        for(Tagging tag : file.getOssTagging().getTagging()){
                            System.out.println("Key: " + tag.getKey());
                            System.out.println("Value: " + tag.getValue());
                        }
                    }
                    if(file.getOssUserMeta() != null){
                        for(UserMeta meta : file.getOssUserMeta().getUserMeta()){
                            System.out.println("Key: " + meta.getKey());
                            System.out.println("Value: " + meta.getValue());
                        }
                    }
                }
            } else if(doMetaQueryResult.getAggregations() != null){
                for(Aggregation aggre : doMetaQueryResult.getAggregations().getAggregation()){
                    // Query the name of the aggregation field. 
                    System.out.println("Field: " + aggre.getField());
                    // Query the aggregation operator. 
                    System.out.println("Operation: " + aggre.getOperation());
                    // Query the values of the aggregate operations. 
                    System.out.println("Value: " + aggre.getValue());
                    if(aggre.getGroups() != null && aggre.getGroups().getGroup().size() > 0){
                        // Query the values of the aggregation operations by group. 
                        System.out.println("Groups value: " + aggre.getGroups().getGroup().get(0).getValue());
                        // Query the total number of the aggregation operations by group. 
                        System.out.println("Groups count: " + aggre.getGroups().getGroup().get(0).getCount());
                    }
                }
            } else {
                System.out.println("NextToken: " + doMetaQueryResult.getNextToken());
            }
        } catch (OSSException oe) {
            System.out.println("Error Message:" + oe.getErrorMessage());
            System.out.println("Error Code:" + oe.getErrorCode());
            System.out.println("Request ID:" + oe.getRequestId());
            System.out.println("Host ID:" + oe.getHostId());
        } catch (ClientException ce) {
            System.out.println("Error Message: " + ce.getMessage());
        } finally {
            // Shut down the OSSClient instance. 
            ossClient.shutdown();
        }
    }

Python

# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
from oss2.models import MetaQuery, AggregationsRequest
# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())

# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com. 
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Specify the ID of the region that maps to the endpoint. Example: cn-hangzhou. This parameter is required if you use the signature algorithm V4.
region = "cn-hangzhou"

# Specify the name of the bucket. Example: examplebucket. 
bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)

# Query objects that meet specific conditions and list the object information based on specific fields and sorting methods. 
# Query objects that are smaller than 1 MB, return up to 10 objects at a time, and sort the objects in ascending order. 
do_meta_query_request = MetaQuery(max_results=10, query='{"Field": "Size","Value": "1048576","Operation": "lt"}', sort='Size', order='asc')
result = bucket.do_bucket_meta_query(do_meta_query_request)

# Display the object names. 
print(result.files[0].file_name)
# Display the ETags of the objects. 
print(result.files[0].etag)
# Display the types of the objects. 
print(result.files[0].oss_object_type)
# Display the storage classes of the objects. 
print(result.files[0].oss_storage_class)
# Display the CRC-64 values of the objects. 
print(result.files[0].oss_crc64)
# Display the access control lists (ACLs) of the objects. 
print(result.files[0].object_acl)

package main

import (
	"fmt"
	"os"

	"github.com/aliyun/aliyun-oss-go-sdk/oss"
)

func main() {
	// Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
	provider, err := oss.NewEnvironmentVariableCredentialsProvider()
	if err != nil {
		fmt.Println("Error:", err)
		os.Exit(-1)
	}

	// Create an OSSClient instance. 
        // Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com. Specify your actual endpoint. 
	// Specify the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the region to cn-hangzhou. Specify the actual region.
	clientOptions := []oss.ClientOption{oss.SetCredentialsProvider(&provider)}
	clientOptions = append(clientOptions, oss.Region("yourRegion"))
	// Specify the version of the signature algorithm.
	clientOptions = append(clientOptions, oss.AuthVersion(oss.AuthV4))
	client, err := oss.New("yourEndpoint", "", "", clientOptions...)
	if err != nil {
		fmt.Println("Error:", err)
		os.Exit(-1)
	}
	// Query objects that are larger than 30 bytes in size, return up to 10 objects at the same time, and then sort the objects in ascending order. 
	query := oss.MetaQuery{
		NextToken:  "",
		MaxResults: 10,
		Query:      `{"Field": "Size","Value": "30","Operation": "gt"}`,
		Sort:       "Size",
		Order:      "asc",
	}
	// Query objects that match the specified conditions and list object information based on the specified fields and sorting methods. 
	result, err := client.DoMetaQuery("examplebucket", query)
	if err != nil {
		fmt.Println("Error:", err)
		os.Exit(-1)
	}
	fmt.Printf("NextToken:%s\n", result.NextToken)
	for _, file := range result.Files {
		fmt.Printf("File name: %s\n", file.Filename)
		fmt.Printf("size: %d\n", file.Size)
		fmt.Printf("File Modified Time:%s\n", file.FileModifiedTime)
		fmt.Printf("Oss Object Type:%s\n", file.OssObjectType)
		fmt.Printf("Oss Storage Class:%s\n", file.OssStorageClass)
		fmt.Printf("Object ACL:%s\n", file.ObjectACL)
		fmt.Printf("ETag:%s\n", file.ETag)
		fmt.Printf("Oss CRC64:%s\n", file.OssCRC64)
		fmt.Printf("Oss Tagging Count:%d\n", file.OssTaggingCount)
		for _, tagging := range file.OssTagging {
			fmt.Printf("Oss Tagging Key:%s\n", tagging.Key)
			fmt.Printf("Oss Tagging Value:%s\n", tagging.Value)
		}
		for _, userMeta := range file.OssUserMeta {
			fmt.Printf("Oss User Meta Key:%s\n", userMeta.Key)
			fmt.Printf("Oss User Meta Key Value:%s\n", userMeta.Value)
		}
	}
}

OSS APIの使用

ビジネスで高度なカスタマイズが必要な場合は、RESTful APIを直接呼び出すことができます。 APIを直接呼び出すには、コードに署名計算を含める必要があります。詳細については、「DoMetaQuery」をご参照ください。

検索条件と検索結果の設定

検索条件

すべての検索条件を次の表に示します。ビジネス要件に基づいて1つ以上の検索条件を指定できます。

OSSメタデータ条件

検索条件	説明
ストレージクラス	デフォルトでは、OSSでサポートされている次のストレージクラスが選択されています。標準、低頻度アクセス (IA) 、アーカイブ、コールドアーカイブ、およびディープコールドアーカイブ。ビジネス要件に基づいてストレージクラスを指定できます。
ACL	デフォルトでは、OSSでサポートされている次のACLが選択されています。バケット、プライベート、パブリック読み取り、およびパブリック読み取り /書き込みから継承されます。ビジネス要件に基づいてACLを指定できます。
オブジェクト名	あいまい一致または等しいを選択できます。 exampleobject.txtなど、検索結果にオブジェクトの名前を表示する場合は、次のいずれかの方法を使用してオブジェクト名を一致させることができます。等しいを選択し、オブジェクトのフルネームを入力します。例: `exampleobject.txt` あいまい一致を選択し、オブジェクト名のプレフィックスまたはサフィックスを入力します。例:`例`または`. txt`. 重要あいまい一致は、特定の文字を含むすべてのオブジェクト名と一致します。たとえば、Fuzzy Matchの横に`test`と入力した場合、localfolder/test/.example.jpgとlocalfolder/test.jpgが検索条件を満たし、検索結果に表示されます。
アップロードの種類	デフォルトでは、次のアップロードタイプが選択されています。ビジネス要件に基づいてアップロードタイプを指定できます。 Normal: 検索結果でシンプルアップロードを使用してアップロードされたオブジェクトを返します。マルチパート: マルチパートアップロードを使用してアップロードされたオブジェクトを検索結果に返します。 Appendable: 検索結果で追加アップロードを使用してアップロードされたオブジェクトを返します。 Symlink: シンボリックリンクを返します。
最終更新日時	最終変更日時には、開始日と [終了日] を指定できます。開始日と終了日の値は秒単位で正確です。
オブジェクトサイズ	オブジェクトサイズは、等しい、次の値より大きい：、指定の値以上、指定の値より小さい、または指定の値以下を選択できます。 (単位：KB)
オブジェクトバージョン	現在のバージョンのオブジェクトのみを検索できます。

オブジェクトETagとタグ条件

ETagsとタグに基づいてオブジェクトを検索する場合は、検索結果に表示するオブジェクトのETagsまたはタグを入力できます。

ETagsは完全一致のみをサポートします。 ETagは引用符 (“) で囲む必要があります。例: "5B3C1A2E0563E1B002CC607C6689" 複数のETagsを指定する場合は、改行で区切ります。
キーと値のペアを使用してオブジェクトのタグを指定します。オブジェクトタグのキーと値は大文字と小文字を区別します。タグルールの詳細については、「オブジェクトへのタグの追加」をご参照ください。

検索結果の設定

特定の条件に基づいて、検索結果を並べ替えたり、検索結果の統計を表示したりできます。

オブジェクトのソート順序: ビジネス要件に基づいて、最終変更時刻、オブジェクト名、およびオブジェクトサイズに基づいて、検索結果を昇順、降順、またはデフォルトの順序でソートできます。
データ集約: 重複排除、グループ数、最大、最小、平均、合計などの特定の条件に基づいて、検索結果の統計を表示できます。これにより、効率的なデータ分析および管理が容易になる。

よくある質問

何億ものオブジェクトがバケットに保存されている場合、データインデックスが長期間作成されないのはなぜですか?

600オブジェクトのインデックスを作成するには、約1秒かかります。バケット内のオブジェクトの数に基づいて、インデックスの作成に必要な期間を見積もることができます。

Object Storage Service:MetaSearchを使用してメタデータ属性に基づいてOSSオブジェクトを検索する