With the arrival of Elasticsearch 8.x, a new horizon in search technology has emerged. Elasticsearch has welcomed Reciprocal Rank Fusion (RRF) into its suite of capabilities, offering an enriched approach to merging and re-ranking multiple result sets. This innovative feature retains the foundation of its predecessors—relevance ranking via BM25 and recall sorting through vector similarity—while empowering a cohesive and more precise ranking process when integrated. By pairing these methodologies through RRF, Elasticsearch heightens its accuracy in delivering search results. This article will walk you through the technicalities of this integration using a detailed example.
To embark on this journey, you'll need an Elasticsearch cluster running version 8.8 or later. Alibaba Cloud Elasticsearch has made clusters with the latest version 8.x available for immediate purchase.
1) Make your selection between versions 8.8 or 8.9 and configure your nodes.
2) Navigate through and select the appropriate network configuration.
3) Proceed to checkout with a simple click.
RRF operates on an algorithmic formula where 'k' is a constant value set by default to 60. Within the algorithm, 'R' represents the document sets sorted for every result from a query. Here, 'r(d)' specifies the rank order of document 'd' under certain query conditions, starting from one.
Algorithmic formula:
In a scenario where documents are ranked based on both BM25 and dense embedding, RRF seamlessly blends the outcomes to produce an integrated and improved ranking.
BM25 Rank | Dense Embeding Rank | RRF Result k=0 |
---|---|---|
A 1 | B 1 | B:12+1/1=1.5 |
B 2 | C 2 | A:1/1+1/3=1.3 |
C 3 | A 3 | C:1/3+1/2=0.83 |
Pursuing the methodology described in the ESRE Series (I), we utilized the text_embedding model and launched the deployment through Eland. Subsequently, we uploaded the initial dataset via Kibana, configured the text-embeddings pipeline, and ultimately crafted indexed data replete with vectors via index rebuilding.
What is Vector Search and Embedding Model?
For the assessment, one query from the TREC 2019 Deep Learning Track's "Paragraph Ranking Task" was selected to test the search results against the three techniques: text, vector, and RRF fusion. We utilized the query "hydrogen is a liquid below what temperature" to exemplify and contrast these methods.
// RRF Mixed Arrangement Query
GET collection-with-embeddings/_search
{
"size": 10,
"query": {
"query_string": {
"query": "hydrogen is a liquid below what temperature"
}
},
"knn": [
{
"field": "text_embedding.predicted_value",
"k": 10,
"num_candidates": 100,
"query_vector_builder": {
"text_embedding": {
"model_id": "sentence-transformers__msmarco-minilm-l-12-v3",
"model_text": "hydrogen is a liquid below what temperature"
}
}
}
],
"_source": [
"id"
],
"rank": {
"rrf": {
"window_size": 10,
"rank_constant": 1
}
}
}
//vector search
GET collection-with-embeddings/_search
{
"size": 10,
"knn": [
{
"field": "text_embedding.predicted_value",
"k": 10,
"num_candidates": 100,
"query_vector_builder": {
"text_embedding": {
"model_id": "sentence-transformers__msmarco-minilm-l-12-v3",
"model_text": "hydrogen is a liquid below what temperature"
}
}
}
],
"_source": [
"id"
]
}
//text search
GET collection-with-embeddings/_search
{
"size": 10,
"query": {
"query_string": {
"query": "how are some sharks warm blooded"
}
},
"_source": [
"id"
]
}
The three query types yielded varying results in terms of accuracy—scaled from 'not relevant' to 'completely relevant.' It's evident from the rankings that RRF's ability to synthesize vector and text query results pushes relevant documents - such as "7911557", previously absent from vector results, to the forefront. Simultaneously, RRF spotlighted the importance of documents like "6080460", which the text query originally overlooked, thereby sharpening recall precision.
RRF Mixed Arrangement Query | Vector search | Text Search | |||
---|---|---|---|---|---|
Paragraph ID | accuracy | Paragraph ID | accuracy | Paragraph ID | accuracy |
8588222 | 0 | 8588222 | 0 | 7911557 | 3 |
8588219 | 3 | 8588219 | 3 | 8588219 | 3 |
7911557 | 3 | 6080460 | 3 | 8588222 | 0 |
128984 | 3 | 128984 | 3 | 2697752 | 2 |
6080460 | 3 | 4254815 | 1 | 128984 | 3 |
2697752 | 2 | 6343521 | 1 | 1721142 | 0 |
4254815 | 1 | 1020793 | 0 | 8588227 | 0 |
1721142 | 0 | 4254811 | 3 | 302210 | 1 |
6343521 | 1 | 1959030 | 0 | 2697746 | 2 |
8588227 | 0 | 4254813 | 1 | 7350325 | 0 |
Through the adept integration of search technologies, Elasticsearch's adoption of RRF underpins a more accurate and refined experience for users delving into the vast expanse of data. Discover the power of enhanced search with Alibaba Cloud Elasticsearch's public cloud service — where precision meets performance.
Search and Analytics Service Elasticsearch Version: Alibaba Cloud Elasticsearch is a fully managed Elasticsearch cloud service built on the open-source Elasticsearch, supporting out-of-the-box functionality and pay-as-you-go while being 100% compatible with open-source features. Not only does it provide the cloud-ready components of the Elastic Stack, including Elasticsearch, Logstash, Kibana, and Beats, but it also partners with Elastic to offer the free X-Pack (Platinum level advanced features) commercial plugin. This integration includes advanced features such as security, SQL, machine learning, alerting, and monitoring, and is widely used in scenarios such as real-time log analysis, information retrieval, and multi-dimensional data querying and statistical analysis.
For more information about Elasticsearch, please visit https://www.alibabacloud.com/en/product/elasticsearch.
Alibaba Cloud Unleashes New AI Search Solution with Elasticsearch 8.9 Release
Data Geek - April 12, 2024
Alibaba Cloud Community - September 5, 2024
Alibaba Cloud Data Intelligence - June 20, 2024
Data Geek - December 2, 2024
Data Geek - October 8, 2024
Data Geek - April 25, 2024
OpenSearch helps develop intelligent search services.
Learn MoreAn intelligent image search service with product search and generic search features to help users resolve image search requests.
Learn MoreAlibaba Cloud Elasticsearch helps users easy to build AI-powered search applications seamlessly integrated with large language models, and featuring for the enterprise: robust access control, security monitoring, and automatic updates.
Learn MoreTransform your business into a customer-centric brand while keeping marketing campaigns cost effective.
Learn MoreMore Posts by Data Geek