The rds_embedding extension of ApsaraDB RDS for PostgreSQL allows you to convert text in your ApsaraDB RDS for PostgreSQL instance into vectors. The extension provides custom model configuration and model invocation capabilities to facilitate the conversion and meet specific data processing requirements.
Background information
Embedding is a technique that translates high-dimensional data into a low-dimensional space. In machine learning and natural language processing (NLP), embedding is a common method that is used to represent sparse symbols or objects as continuous vectors.
During embedding, the vectors are obtained based on the model that is referenced. ApsaraDB RDS for PostgreSQL allows you to use the rds_embedding extension to convert text in your RDS instance into vectors based on an external model that is referenced. ApsaraDB RDS for PostgreSQL also allows you to use a vector similarity operator to calculate the similarities between the text in the RDS instance and the specified text in the referenced model. This helps meet your business requirements in various scenarios.
Prerequisites
Your RDS instance runs PostgreSQL 14.0 or a later version.
The minor engine version of the RDS instance is updated. If the major engine version of the RDS instance meets the requirements but the extension is still not supported, you can update the minor engine version. For more information, see Update the minor engine version.
The
API key
for the model that is used in this topic is obtained, and the region in which the RDS instance resides supports access to OpenAI. In this topic, the embedding model of OpenAI and the Singapore region are used. For more information, see Embeddings.The RDS instance is connected over the Internet. By default, you cannot connect to the RDS instance over the Internet. You must create a NAT gateway for the virtual private cloud (VPC) in which the RDS instance resides. This way, you can connect to the RDS instance over the Internet and the RDS instance can access external models. For more information about NAT gateways, see Use the SNAT feature of an Internet NAT gateway to access the Internet.
Enable or disable the extension
You must use a privileged account to execute the statements in this section.
Enable the extension.
Before you enable the
rds_embedding
extension, you must enable thevector
extension. Thevector
extension supports the required vector data types and basic vector data operations, such as calculations of the distance and similarities between vectors. Therds_embedding
extension only translates high-dimensional text into vectors.CREATE EXTENSION vector; CREATE EXTENSION rds_embedding;
Disable the extension.
DROP EXTENSION rds_embedding; DROP EXTENSION vector;
Example
Create a test table named test.
CREATE TABLE test(info text, vec vector(1536) NOT NULL);
Add a model.
SELECT rds_embedding.add_model('text-embedding-ada-002','https://api.openai.com/v1/embeddings','Authorization: Bearer sk-****P','{"input":{"texts":["%s"]},"model":"text-embedding-v1"}','->"data"->0->>"embedding"');
NoteThe model that is used in this topic is an OpenAI embedding model. The
API key
for the model that is used in this topic is obtained, and the region in which the RDS instance resides supports access to OpenAI. In this topic, the embedding model of OpenAI and the Singapore region are used. For more information, see Embeddings.For more information about the function that is used in this step, see rds_embedding.add_model().
Insert text and the required vector data.
INSERT INTO test SELECT '风急天高猿啸哀', rds_embedding.get_embedding_by_model('text-embedding-ada-002', 'sk-****P', '风急天高猿啸哀')::real[]; INSERT INTO test SELECT '渚清沙白鸟飞回', rds_embedding.get_embedding_by_model('text-embedding-ada-002', 'sk-****P', '渚清沙白鸟飞回')::real[]; INSERT INTO test SELECT '无边落木萧萧下', rds_embedding.get_embedding_by_model('text-embedding-ada-002', 'sk-****P', '无边落木萧萧下')::real[]; INSERT INTO test SELECT '不尽长江滚滚来', rds_embedding.get_embedding_by_model('text-embedding-ada-002', 'sk-****P', '不尽长江滚滚来')::real[];
NoteFor more information about the function that is used in this step, see rds_embedding.get_embedding_by_model().
Calculate the similarities between the text
不尽长江滚滚来
and the vectors of each piece of text in the test table.SELECT info, vec <=> rds_embedding.get_embedding_by_model('text-embedding-ada-002', 'sk-****P', '不尽长江滚滚来')::real[]::vector AS distance FROM test ORDER BY vec <=> rds_embedding.get_embedding_by_model('text-embedding-ada-002', 'sk-****P', '不尽长江滚滚来')::real[]::vector;
Sample output:
info | distance ----------------+-------------------- 不尽长江滚滚来 | 0 无边落木萧萧下 | 0.6855717919553399 风急天高猿啸哀 | 0.7423166439170339 渚清沙白鸟飞回 | 0.7926204045363088 (4 rows)