Elasticsearch has become a popular choice for its powerful full-text search capabilities as a robust, open-source, distributed search and data analytics engine. Despite Elasticsearch's Java core, its RESTful API invites developers from various programming languages, including Python, to interact seamlessly. This article will delve into combining Elasticsearch with Python to enhance our projects, specifically utilizing the environment on Alibaba Cloud Elasticsearch.
Before diving into the technicalities, ensure you have a running Elasticsearch 8.X instance and a Kibana instance.
For deployment details, I recommend reading Chapter 3 of "Elasticsearch Deep Dive," focusing on Elasticsearch cluster deployment.
Several Python libraries are available for interacting with Elasticsearch:
Client | Use Cases | Pros | Cons |
---|---|---|---|
elasticsearch-py | Direct low-level operations with Elasticsearch | Full access to Elasticsearch APIs, high flexibility | Complex code, steep learning curve |
elasticsearch-dsl | Complex search queries | Simplifies query construction, Pythonic interface, less syntax error risk | Steeper learning curve |
django-elasticsearch-dsl | Using Elasticsearch in Django projects | Seamless integration with Django, auto-syncing of models and documents | Limited to Django, more abstraction layers |
The official low-level Python client elasticsearch-py allows performing all basic and advanced Elasticsearch operations. Let's start with some basic operation examples:
from elasticsearch import Elasticsearch
import configparser
import warnings
warnings.filterwarnings("ignore")
def init_es_client(config_path='./conf/config.ini'):
# Initialize parser
config = configparser.ConfigParser()
# Read config
config.read(config_path)
# Retrieve Elasticsearch configuration
es_host = config.get('elasticsearch', 'ES_HOST')
es_user = config.get('elasticsearch', 'ES_USER')
es_password = config.get('elasticsearch', 'ES_PASSWORD')
es = Elasticsearch(
hosts=[es_host],
basic_auth=(es_user, es_password),
verify_certs=False,
ca_certs='conf/http_ca.crt'
)
return es
def create_index(es, index_name="test-index"):
if not es.indices.exists(index=index_name):
es.indices.create(index=index_name)
def define_mapping(es, index_name="test-index"):
mapping = {
"mappings": {
"properties": {
"name": {"type": "text"},
"age": {"type": "integer"}, "email": {"type": "keyword"}
}
}
}
es.indices.create(index=index_name, body=mapping, ignore=400)
def insert_document(es, index_name="test-index", doc_id=None, document=None):
es.index(index=index_name, id=doc_id, document=document)
def update_document(es, index_name="test-index", doc_id=None, updated_doc=None):
es.update(index=index_name, id=doc_id, body={"doc": updated_doc})
def delete_document(es, index_name="test-index", doc_id=None):
es.delete(index=index_name, id=doc_id)
def search_documents(es, index_name="test-index", query=None):
return es.search(index=index_name, body=query)
def main():
# Initialization
es = init_es_client()
# Index creation and mapping
create_index(es)
define_mapping(es)
# Document operations
doc = {"name": "John Doe", "age": 30, "email": "john.doe@example.com
insert_document(es, "test-index", doc_id="1", document=doc)
# Update document
update_document(es, "test-index", "1", {"age": 31})
# Search document
query = {"query": {"match": {"name": "John Doe"}}}
results = search_documents(es, "test-index", query)
print(results)
# Delete document
delete_document(es, "test-index", "1")
if __name__ == "__main__":
main()
This code demonstrates the process of interacting with Elasticsearch using the elasticsearch-py library. From creating and deleting indices, defining mappings, and performing CRUD operations on documents to executing simple searches, it encapsulates a wide range of fundamental Elasticsearch operations.
While elasticsearch-py provides comprehensive access to Elasticsearch capabilities, elasticsearch-dsl simplifies complex search query construction, showcasing a more Pythonic approach. The following snippets illustrate how to utilize elasticsearch-dsl effectively:
from elasticsearch_dsl import Document, Text, Integer, connections, Search
def init_es_client_dsl(config_path='./conf/config.ini'):
config = configparser.ConfigParser()
config.read(config_path)
es_host = config.get('elasticsearch', 'ES_HOST')
es_user = config.get('elasticsearch', 'ES_USER')
es_password = config.get('elasticsearch', 'ES_PASSWORD')
connections.create_connection(
hosts=[es_host],
http_auth=(es_user, es_password),
verify_certs=False
)
class MyDocument(Document):
name = Text()
age = Integer()
email = Text()
class Index:
name = 'test-index-dsl'
def create_index_dsl():
MyDocument.init()
def insert_document_dsl(document):
doc = MyDocument(meta={'id': document.get('id', None)}, **document)
doc.save()
def update_document_dsl(doc_id, updated_doc):
doc = MyDocument.get(id=doc_id)
for key, value in updated_doc.items():
setattr(doc, key, value)
doc.save()
def delete_document_dsl(doc_id):
doc = MyDocument.get(id=doc_id)
doc.delete()
def search_documents_dsl(query):
s = Search(index="test-index-dsl").query("match", name=query)
response = s.execute()
return response
Whether you choose elasticsearch-py for its direct API access and flexibility or prefer the more abstracted and Pythonic elasticsearch-dsl for simplified query construction, both clients offer powerful ways to interact with Elasticsearch in Python projects. Leveraging these in conjunction with Alibaba Cloud Elasticsearch can significantly enhance your search and data analytics capabilities.
Ready to start your journey with Elasticsearch on Alibaba Cloud? Explore our tailored Cloud solutions and services to take the first step towards transforming your data into a visual masterpiece. Click here, Embark on Your 30-Day Free Trial
Word Frequency Analysis using Elasticsearch on Alibaba Cloud
Elasticsearch vs OpenSearch: A Technical Guide for Choosing Your Open Source Search Platform
Data Geek - July 29, 2024
Alibaba Cloud Community - April 15, 2024
Alibaba Clouder - January 5, 2021
Data Geek - July 2, 2024
Data Geek - July 10, 2024
Data Geek - July 25, 2024
Alibaba Cloud Elasticsearch helps users easy to build AI-powered search applications seamlessly integrated with large language models, and featuring for the enterprise: robust access control, security monitoring, and automatic updates.
Learn MoreHigh Performance Computing (HPC) and AI technology helps scientific research institutions to perform viral gene sequencing, conduct new drug research and development, and shorten the research and development cycle.
Learn MoreElastic and secure virtual cloud servers to cater all your cloud hosting needs.
Learn MoreAn independent public IP resource that decouples ECS and public IP resources, allowing you to flexibly manage public IP resources.
Learn MoreMore Posts by Data Geek