By Wang Tantan (Si Ming)
When users store massive documents, media files and other data, it is essential to manage the file metadata. Metadata has multi-dimensional field information. Basic information includes the file size, creation time, user and so on. With the development of artificial intelligence, extracting the core elements of files through AI technology has become an important part of file metadata. Taking an image as an example: Users can use the smart media services to obtain and analyze the core tags of the image, and score the tags. They can also extract face recognition related information, geographic location information and other information, and the extracted information also needs to be stored in the file metadata information. As a result, the amount of information on file metadata is constantly increasing, and the formats and types are also constantly diversified.
A smart media management platform provides users with file management services (such as images, and videos). Users can analyze target files through self-developed (or purchased) smart media analysis tools. The original metadata information is enriched with the analyzed information. Therefore, the platform needs an effective metadata management solution to provide users with the functions to manage, analyze, and collect metadata information. Example:
User A - All pictures that meet the following conditions and they are sorted by tag scores: [files of user A] [last year] {tags containing [Happy]}
User B - All videos that meet the following conditions and they are sorted by the similarity of the celebrity: [files of user B] * [XX celebrity has appeared]
A sample of the management system is as follows: Project Sample
For smart metadata management systems, the technical factors that need to be considered generally include the following:
Through the SearchIndex solution developed by Table Store, the problem of managing massive metadata can be effectively solved. Table Store is out-of-the-box and pay-as-you-go.
Table Store is a fully-hosted, zero-maintenance and distributed NoSQL data storage service from Alibaba Cloud that provides features such as storage of massive amounts of data, automatic sharding of hot data, and multi-dimensional retrieval of massive amounts of data. Table Store can efficiently solve the data explosion challenge. SearchIndex can be created at any time, which is an appropriate solution for metadata management
At the same time, SearchIndex provides multi-dimensional data search, statistics and other capabilities on the basis of ensuring high availability of user data. You can create multiple indexes for multiple scenarios to achieve retrieval in multiple modes. You can create and activate indexes as needed. Table Store ensures the consistency of data synchronization, greatly reducing the work required for your solution design, service maintenance, and code development.
The sample is integrated in the Table Store console. You can log on to the console to experiment with the system. (If you are a new Table Store user, you need to click Activate Now for a trial of this service. The service activation is free. Metadata is stored in public instances. A trial doesn't consume storage, network traffic, or CUs.)
Note: This sample provides file metadata at the scale of 100 million entries. Official console address: Project Sample
If you are interested in the smart metadata management system and want to build your own system, you can follow these steps:
Activate the Table Store service in the console. Table Store is out-of-the-box (post-paid) and billed on a pay-as-you-go basis. Table Store also provides a free quota that is sufficient for functional tests. For more information, visit Table Store Console and Free quota description.
Create a Table Store instance in the console and select a region that supports SearchIndex. (Currently the SearchIndex feature has not been commercialized and is supported in the following regions:Beijing, Shanghai, Hangzhou, and Shenzhen. This feature will be gradually available in other regions.)
After the instance is created, open a ticket to apply for the SearchIndex beta test invitation. (After becoming commercialized, SearchIndex will be enabled by default. No fees will be incurred if the feature is not used.)
Use SDKs with SearchIndex (see the official website for more details). Currently, new functions are added for Java, Go, and Node.js SDKs.
Java-SDK
<dependency>
<groupId>com.aliyun.openservices</groupId>
<artifactId>tablestore</artifactId>
<version>4.8.0</version>
</dependency>
Go-SDK
$ go get github.com/aliyun/aliyun-tablestore-go-sdk
Nodejs-SDK
$ npm install tablestore@4.1.0
Table name: order_contract
To create a smart metadata table, users only need to maintain one instance and create the table under the instance as follows:
Create and manage data tables through the console (users can also directly create data tables through the SDK):
Table Store automatically synchronizes full and incremental index data: Users can create and manage SearchIndex through the console (or, they can also create it through the SDK):
Insert some test data (100 million entries of data are inserted into the console sample. Users can insert a small amount of test data on the console);
File ID | File ID (MD5 primary key) | User ID | Tag (array string) | Type | Link | Size |
f052535742 | 1bce.... | u05254 | [{"score":99.999999,"tag":"Table Store"},{"score":78.962224,"tag":"Hail"},{"score":18.328385,"tag":"Happy"},{"score":16.886812,"tag":"Snow Mountain"}] | image | https://prd-console-demo.oss-cn-hangzhou.aliyuncs.com/image/imm1.jpg | 9022066 |
Data reading falls into two types:
1. Primary Key Reading
The primary key column is obtained based on the native Table Store: getRow, getRange, batchGetRow. Primary key reading is used for index (automatic) reverse lookup. Users can also provide a single query page for the primary key (File ID MD5). And the query speed is extremely fast at the scale of 100 million entries of data. Multi-dimensional retrieval is not supported for the single primary key query;
2. Index Reading
Query based on the new SearchIndex function: the search interface. Users can freely design multi-dimensional combination queries for index fields. By setting and selecting different query parameters, different query criteria and different sorting methods are built. Currently, exact query, range query, prefix query, match query, wildcard query, phrase match query, word breaking string query nested query, and geo query are supported, and they are combined by boolean AND and OR.
For example, information for the file with [tag: Table Store, creation time (2018-01-01, 2018-12-01)]: (the SDK and the control query)
List<Query> mustQueries = new ArrayList<Query>();
// Nested query
TermQuery termQuery = new TermQuery();
termQuery.setFieldName("tags.tag");
termQuery.setTerm(ColumnValue.fromString("Table Store"));
NestedQuery nestedQuery = new NestedQuery();
nestedQuery.setPath("tags");
nestedQuery.setScoreMode(ScoreMode.Avg);
nestedQuery.setQuery(termQuery);
mustQueries.add(nestedQuery);
// Range query
RangeQuery rangeQuery = new RangeQuery();
rangeQuery.setFieldName("createdAt");
rangeQuery.setFrom(ColumnValue.fromLong(1514793600000, true);
rangeQuery.setTo(ColumnValue.fromLong(1543651200000, false);
mustQueries.add(rangeQuery);
//Exact query
TermQuery termQuery = new TermQuery();
termQuery.setFieldName("type");
termQuery.setTerm(ColumnValue.fromString("image"));
mustQueries.add(termQuery);
BoolQuery boolQuery = new BoolQuery();
boolQuery.setMustQueries(mustQueries);
57 posts | 12 followers
FollowAlibaba Clouder - July 31, 2018
Alibaba Cloud Storage - February 27, 2020
Alibaba EMR - March 16, 2021
Alibaba Cloud MaxCompute - December 22, 2021
Alibaba Clouder - March 14, 2019
Alibaba Cloud Storage - May 8, 2019
57 posts | 12 followers
FollowA fully managed NoSQL cloud database service that enables storage of massive amount of structured and semi-structured data
Learn MoreApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.
Learn MorePlan and optimize your storage budget with flexible storage services
Learn MoreA cost-effective, efficient and easy-to-manage hybrid cloud storage solution.
Learn MoreMore Posts by Alibaba Cloud Storage