Elasticsearch Open Inference API Adds Support for Alibaba Cloud AI Search

By Dave Kyle and Weizijun

We are excited to announce our latest addition to the Elasticsearch Open Inference API: the integration of Alibaba Cloud AI Search. This work enables Elastic users to connect directly with the Alibaba Cloud AI platform. Developers building RAG applications using the Elasticsearch vector database can store and use dense and sparse embeddings generated from models hosted on Alibaba Cloud AI Search platform with semantic_text. In addition, Elastic users now have integrated access to reranking models for enhanced semantic reranking and the Qwen LLM family.

In this blog, we explore how to integrate Alibaba Cloud's AI services with Elasticsearch. You'll learn how to set up and use Alibaba's completion, rerank, sparse embedding, and text embedding services within Elasticsearch. The broad set of supported models integrated into inference task types will enhance the relevance of many use cases including RAG.

We’re grateful to the Alibaba Cloud team for contributing support for these task types to Elasticsearch open inference API!

Let’s walk through examples of how to configure and use these services within an Elasticsearch environment. Note Alibaba uses the term service_id instead of model_id.

Using a Base Model in Alibaba Cloud AI Search Platform

This walkthrough assumes you already have an Alibaba Cloud Account with access to Alibaba Cloud AI Search platform. Next, you’ll need to create a workspace and API key for Inference creation.

Creating an Inference API Endpoint in Elasticsearch

In Elasticsearch, create your endpoint by providing the service as “alibabacloud-ai-search”, and the service settings including your workspace, the host, the service id and your api keys to access Alibaba Cloud AI Search platform. In our example, we're creating a text embedding endpoint using "ops-text-embedding-001" as the service id.

PUT _inference/text_embedding/ali_ai_embeddings{    "service": "alibabacloud-ai-search",    "service_settings": {        "api_key": "<api_key>",        "service_id": "ops-text-embedding-001",        "host": "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",        "workspace": "default"    }}

You will receive a response from Elasticsearch with the endpoint that was created successfully:

{
  "inference_id": "ali_ai_embeddings",
  "task_type": "text_embedding",
  "service": "alibabacloud-ai-search",
  "service_settings": {
    "similarity": "dot_product",
    "dimensions": 1536,
    "service_id": "ops-text-embedding-001",
    "host": "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",
    "workspace": "default" ,
    "rate_limit": {
      "requests_per_minute": 10000
    }
  },
  "task_settings": {}
}

Note that there are no additional settings for model creation. Elasticsearch will automatically connect to the Alibaba Cloud AI Search platform to test your credentials and the service id, and fill in the number of dimensions and similarity measures for you.

Next, let’s test our endpoint to ensure everything is set up correctly. To do this, we’ll call the perform inference API:

POST _inference/text_embedding/ali_ai_embeddings{  "input": "What is Elastic?"}

The API call will return the generated embeddings for the provided input, which will look something like this:

{
    "text_embedding": [
        {
            "embedding": [
                0.048400473,
                0.051464397,
                … (additional values) …
                0.033325635,
                -0.008986305
            ]
        }
    ]
}

You are now ready to start exploring. After you have tried these examples, have a look at some new exciting innovations in Elasticsearch for semantic search use cases:

The new field simplifies storage and chunking of embeddings - just pick your model and Elastic does the rest!

Introduced in 8.14, allow you to setup multi-stage retrieval pipelines

But first, let’s dive into our examples!

I. Completion

To start, Alibaba Cloud provides several models for chat completion, with service IDs listed in their API documentation.

Step 1: Configure the Completion Service

First, set up the inference service for text completion:

PUT _inference/completion/ali-chat{ "service": "alibabacloud-ai-search", "service_settings": {     "host" : "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",     "api_key": "xxxxxxxxxxxxxxxxxx",     "service_id": "ops-qwen-turbo",     "workspace" : "default" }}

Response

{
 "inference_id": "ali-chat",
 "task_type": "completion",
 "service": "alibabacloud-ai-search",
 "service_settings": {
   "service_id": "ops-qwen-turbo",
   "host": "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",
   "workspace": "default",
   "rate_limit": {
     "requests_per_minute": 1000
   }
 },
 "task_settings": {}
}

Step 2: Issue a Completion Request

Using the configured endpoint, send a POST request to generate a completion:

POST _inference/completion/ali-chat{ "input":["Where is the capital of Henan?"]}

Returns

{
 "completion": [
   {
     "result": "The capital of Henan is Zhengzhou."
   }
 ]
}

Uniquely, for this Elastic Inference API integration with Alibaba, chat history can be included in the inputs, in this example, we’ve included the previous response and added: “What fun things are there?”

POST _inference/completion/ali-chat{ "input":["Where is the capital of Henan?", "The capital of Henan is Zhengzhou.", "What fun things are there?" ]}

The response clearly includes the history

{
 "completion": [
   {
     "result": "I'm sorry, I do not have enough information to provide a specific list of fun things to do in Zhengzhou, Henan. I can only tell you that Zhengzhou is the capital of Henan province. To find out about fun activities, attractions, or events in Zhengzhou, I would suggest researching local tourism websites, asking locals, or checking out travel guides for the area."
   }
 ]
}

In future updates, we plan to allow users to explicitly include chat history, improving the ease of usage.

II. Rerank

Moving on to our next task type, rerank. Reranking helps re-order search results for improved relevance, using Alibaba's powerful models. If you want to read more about this concept, have a look at this blog on Elastic Search Labs.

Step 1: Configure the Rerank Service

Configure the reranking inference service:

PUT _inference/rerank/ali-rank{ "service": "alibabacloud-ai-search", "service_settings": {   "api_key": "xxxxxxxxxxxxxxxxxx",   "service_id": "ops-bge-reranker-larger",   "host" : "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",   "workspace" : "default"    }}
{
 "inference_id": "ali-rank",
 "task_type": "rerank",
 "service": "alibabacloud-ai-search",
 "service_settings": {
   "service_id": "ops-bge-reranker-larger",
   "host": "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",
   "workspace": "default",
   "rate_limit": {
     "requests_per_minute": 1000
   }
 },
 "task_settings": {}
}

Step 2: Issue a Rerank Request

Send a POST request to rerank your search query results:

The rerank interface does not require a lot of configuration (task_settings), it returns the relevance scores ordered by the most relevant first and the index of the document in the input array.

POST _inference/rerank/ali-rank{ "query": "What is the capital of the USA?", "input": [   "Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.",

   "Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states.",

   "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.",

   "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",

   "Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.",

   "North Dakota is a state in the United States. 672,591 people lived in North Dakota in the year 2010. The capital and seat of government is Bismarck." ]}
{
 "rerank": [
   {
     "index": 3,
     "relevance_score": 0.9998832
   },
   {
     "index": 4,
     "relevance_score": 0.008847355
   },
   {
     "index": 5,
     "relevance_score": 0.0026626128
   },
   {
     "index": 0,
     "relevance_score": 0.00068250194
   },
   {
     "index": 2,
     "relevance_score": 0.00019716943
   },
   {
     "index": 1,
     "relevance_score": 0.00011591934
   }
 ]
}

III. Sparse Embedding

Alibaba provides a model specifically for sparse embeddings, we will use ops-text-sparse-embedding-001 for our example.

Step 1: Configure the Sparse Embedding Service

PUT _inference/sparse_embedding/ali-sparse-embedding{ "service": "alibabacloud-ai-search", "service_settings": {   "api_key": "xxxxxxxxxxxxxxxxxx",   "service_id": "ops-text-sparse-embedding-001",   "host" : "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",   "workspace" : "default" }}
{
 "inference_id": "ali-sparse-embedding",
 "task_type": "sparse_embedding",
 "service": "alibabacloud-ai-search",
 "service_settings": {
   "service_id": "ops-text-sparse-embedding-001",
   "host": "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",
   "workspace": "default",
   "rate_limit": {
     "requests_per_minute": 1000
   }
 },
 "task_settings": {}
}

Step 2: Issue a Sparse Embedding Query

Sparse has task_settings for:

input_type - either ingest or search

return_token - if true include the token text in the response, else it is a number

POST _inference/sparse_embedding/ali-sparse-embedding{ "input": "Hello world", "task_settings": {   "input_type": "search",   "return_token": true }}
{
 "sparse_embedding": [
   {
     "is_truncated": false,
     "embedding": {
       "hello": 0.27783203,
       "world": 0.28222656
     }
   }
 ]
}

With return_token==false

{
 "sparse_embedding": [
   {
     "is_truncated": false,
     "embedding": {
       "8999": 0.28222656,
       "35378": 0.27783203
     }
   }
 ]
}

IV. Text Embedding

Alibaba also offers text embedding models for different tasks.

Step 1: Configure the Text Embedding Service

Embeddings has one task_setting:

input_type - either ingest or search

PUT _inference/text_embedding/ali-embeddings{ "service": "alibabacloud-ai-search", "service_settings": {   "api_key": "xxxxxxxxxxxxxxxxxx",   "service_id": "ops-text-embedding-001",   "host" : "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",   "workspace" : "default" }}
{
 "inference_id": "ali-embeddings",
 "task_type": "text_embedding",
 "service": "alibabacloud-ai-search",
 "service_settings": {
   "service_id": "ops-text-embedding-001",
   "host": "xxxxx.platform-cn-shanghai.opensearch.aliyuncs.com",
   "workspace": "default",
   "rate_limit": {
     "requests_per_minute": 1000
   },
   "similarity": "dot_product",
   "dimensions": 1536
 },
 "task_settings": {}
}

Step 2: Issue a Text Embedding Request

Send a POST request to generate a text embedding:

POST _inference/text_embedding/ali-embeddings{ "input": "Hello world"}
{
 "text_embedding": [
   {
     "embedding": [
       -0.017036675,
       0.07038724,
       0.044685286,
       0.0064531807,
       0.013290042,
       0.011183944,
       -0.0020014185,
       -0.009508779,
…

AI search with Elastic and Alibaba Cloud

Whether you're using Elasticsearch for implementing hybrid search, semantic reranking, or enhancing RAG use cases with summarization, the connection to Alibaba Cloud's AI Services opens up a new world of possibilities for Elasticsearch developers. Thanks again, Alibaba team, for the contribution!

To dive deep, try this Jupyter notebook with an end-to-end example of using Inference API with the Alibaba Cloud AI Search.

Read Alibaba Cloud's announcement about AI-powered search innovations with Elasticsearch.

Users can start using this with Elasticsearch Serverless environments today and in an upcoming version of Elasticsearch.

Happy searching!

Ready to try this out on your own? Start a free trial.

Elasticsearch has integrations for tools from LangChain, Cohere and more. Join our advanced semantic search webinar to build your next GenAI app!

Community

Elasticsearch Open Inference API Adds Support for Alibaba Cloud AI Search

Using a Base Model in Alibaba Cloud AI Search Platform

Creating an Inference API Endpoint in Elasticsearch

I. Completion

Step 1: Configure the Completion Service

Step 2: Issue a Completion Request

II. Rerank

Step 1: Configure the Rerank Service

Step 2: Issue a Rerank Request

III. Sparse Embedding

Step 1: Configure the Sparse Embedding Service

Step 2: Issue a Sparse Embedding Query

IV. Text Embedding

Step 1: Configure the Text Embedding Service

Step 2: Issue a Text Embedding Request

AI search with Elastic and Alibaba Cloud

Read previous post:

Read next post:

Data Geek

You may also like

Comments

Data Geek

Related Products

Alibaba Cloud Elasticsearch

Database Migration Solution

AI Acceleration Solution

Cloud Migration Solution