All Products
Search
Document Center

AnalyticDB:Vector analysis

Last Updated:Dec 04, 2024

AnalyticDB for PostgreSQL provides vector analysis to help you implement approximate search and analysis of unstructured data. This topic describes the capabilities and benefits of vector analysis.

Introduction to vector databases

In real-world scenarios, most data is presented in the form of unstructured data, such as images, audio, videos, and text. Unstructured data exponentially grows with the emergence of applications in fields such as smart cities, short videos, personalized product recommendation, and visual product search. To process unstructured data, AI technologies are used to extract features from the unstructured data, convert the features into feature vectors, and then analyze and retrieve the feature vectors. The databases that can store, analyze, and retrieve feature vectors are called vector databases.

Vector databases use the vector index technology to quickly retrieve feature vectors. In most cases, vector indexes are used for approximate nearest neighbor search (ANNS). The purpose of vector indexes is to search for the nearest neighbor data and return the most accurate results to improve search efficiency. Compared with traditional databases, vector databases implement highly efficient data search with an acceptable compromise in accuracy.

Two methods are available to apply ANNS vector indexes to the production environment:

  • Build a dedicated vector database that provides the ANNS vector index service to help create vector indexes and retrieve data.

  • Integrate ANNS vector indexes into traditional structured databases to build a database management system (DBMS) that provides the vector search capability.

AnalyticDB for PostgreSQL vector databases are a type of DBMS that integrates the in-house FastANN vector engine. AnalyticDB for PostgreSQL vector databases also provide end-to-end database capabilities such as ease of use, transaction processing, high availability, and high scalability.

How it works

To implement vector analysis, vector databases use AI algorithms to extract features from unstructured data and then use feature vectors to identify the unstructured data. The distance between vectors is used to measure the similarity between unstructured data. AnalyticDB for PostgreSQL uses a massively parallel processing (MPP) architecture to implement vector search and analysis. You can use SQL statements to retrieve unstructured data and perform correlation analysis of structured and unstructured data.

Scenarios

You can use AnalyticDB for PostgreSQL vector analysis in the following intelligent application scenarios:

  • Reverse image search. You can search for images that are similar to a specified image.

  • Video search. You can search for video images that are similar to a specified video image.

  • Voiceprint search. You can search for audio files that are similar to a specified audio file based on voiceprint recognition.

  • Recommendation system. Suitable features can be recommended based on user characteristics.

  • Text search. You can search for texts that are similar to a specified text based on semantics.

  • Q&A chatbots that are built in combination with large language models.

  • File deduplication. You can remove duplicate files based on the fingerprint of a specified file.

Benefits

AnalyticDB for PostgreSQL vector databases use the in-house FastANN vector engine to provide the vector analysis capability. Vector analysis is widely used in various fields such as Alibaba Group data mid-end, e-commerce, new retail, Alibaba Cloud City Brain, and Tongyi Qianwen Q&A service.

Compared with other vector databases, AnalyticDB for PostgreSQL vector databases provide the following advantages:

  • Hybrid analysis of structured and unstructured data

    AnalyticDB for PostgreSQL vector databases use the capabilities of traditional databases to implement hybrid analysis of structured, semi-structured, and unstructured data and utilize structured and semi-structured indexing capabilities in an efficient manner.

  • Two-way retrieval based on vector search and full-text search

    AnalyticDB for PostgreSQL vector databases support vector indexes and full-text indexes, and can use vector search and full-text search to implement two-way retrieval. This significantly improves the retrieval accuracy of vector data.

  • Real-time data update and query

    AnalyticDB for PostgreSQL vector databases support streaming import and real-time building of vector data.

  • Ease of use

    AnalyticDB for PostgreSQL vector databases are easy to use after you create an instance and support standard SQL syntax. This significantly simplifies the development process.

  • Cost-effectiveness

    AnalyticDB for PostgreSQL vector databases can compress FP32 data to the FP16 format. This reduces storage costs by 50%. AnalyticDB for PostgreSQL vector databases also build vector indexes on the segmented paging storage, and can utilize the cache-based swapping mechanism based on the shared buffer of PostgreSQL. This way, AnalyticDB for PostgreSQL can store vector indexes that exceed the available memory size.