AnalyticDB for PostgreSQL provides vector analysis to help you retrieve and analyze unstructured data. This topic describes the features and benefits of vector analysis.
Features
Vector analysis leverages AI algorithms to extract the features of unstructured data and uses feature vectors to identify the unstructured data. The distance between vectors is used to measure the similarity between unstructured data. In AnalyticDB for PostgreSQL, vector analysis is built on a massively parallel processing (MPP) architecture. You can use SQL statements to retrieve unstructured data and perform association analysis between unstructured data and structured data.
Scenarios
You can use vector analysis in AnalyticDB for PostgreSQL to perform the following operations:
- Search for images that are similar to a given image.
- Search for audio files that are similar to a given audio file based on voiceprint recognition.
- Search for texts that are similar to a given text based on semantics.
- Remove duplicate files based on the fingerprint of a given file.
- Analyze which images contain the same product among a large number of images.
The vector analysis feature of AnalyticDB for PostgreSQL is widely used across Alibaba's business portfolio, including the data mid-end, e-commerce, new retail, and urban intelligence.
Architecture
- A web application uses a feature extraction service to extract feature vectors from unstructured data such as texts, images, and audio files, and then writes the vectors to the vector library of AnalyticDB for PostgreSQL.
- During retrieval, the web application uses the feature extraction service to extract feature vectors from unstructured data, and then calls the retrieval analysis interface of AnalyticDB for PostgreSQL to perform retrieval.
Benefits
- Hybrid analysis of structured and unstructured data
For example, vector analysis can help you search for dresses by image on an e-commerce platform, with a price range of USD 100 and USD 200, and a publish date within the last month.
- Real-time data updates
Vector analysis of AnalyticDB for PostgreSQL supports real-time data updates and queries, while data in conventional vector analysis systems must be updated the day after it is written.
- Vector analysis collision
Vector analysis of AnalyticDB for PostgreSQL supports the k nearest neighbor join (kNN join) operation, which compares the similarities between two heaps of vectors. This operation is similar to the kNN join operation in Spark and consumes large amounts of computing resources. To handle heavy computing workloads, AnalyticDB for PostgreSQL makes great improvements to the vector analysis feature.
Typical applications of vector analysis collision include product deduplication and face clustering. AnalyticDB for PostgreSQL can recognize the new products similar to the existing ones in the database and identify the same persons from the face database.
- Ease of use
Vector analysis of AnalyticDB for PostgreSQL is easy to use and supports standard SQL syntax to simplify the development process.
- Cost-effectiveness
Vector data occupies a large amount of storage. One 512-dimensional FLOAT vector consumes 2 KB of storage. AnalyticDB for PostgreSQL can convert FP32 data into FP16, which reduces storage costs by 50%.