×
Community Blog Practical Tips: Mastering Elasticsearch with Python

Practical Tips: Mastering Elasticsearch with Python

This article will delve into combining Elasticsearch with Python to enhance our projects, specifically utilizing the environment on Alibaba Cloud Elasticsearch.

Elasticsearch has become a popular choice for its powerful full-text search capabilities as a robust, open-source, distributed search and data analytics engine. Despite Elasticsearch's Java core, its RESTful API invites developers from various programming languages, including Python, to interact seamlessly. This article will delve into combining Elasticsearch with Python to enhance our projects, specifically utilizing the environment on Alibaba Cloud Elasticsearch.

Getting Started

Before diving into the technicalities, ensure you have a running Elasticsearch 8.X instance and a Kibana instance.

For deployment details, I recommend reading Chapter 3 of "Elasticsearch Deep Dive," focusing on Elasticsearch cluster deployment.

Introducing Elasticsearch Python Clients

Several Python libraries are available for interacting with Elasticsearch:

  • elasticsearch-py: The official low-level client offers direct and flexible access.
  • elasticsearch-dsl: A higher-level wrapper around elasticsearch-py that simplifies operations, recommended for daily use.
  • django-elasticsearch-dsl: Designed for Django users, integrates deeply with the Django framework using elasticsearch-dsl.

Use Cases and Pros/Cons of Various Elasticsearch Python Clients

Client Use Cases Pros Cons
elasticsearch-py Direct low-level operations with Elasticsearch Full access to Elasticsearch APIs, high flexibility Complex code, steep learning curve
elasticsearch-dsl Complex search queries Simplifies query construction, Pythonic interface, less syntax error risk Steeper learning curve
django-elasticsearch-dsl Using Elasticsearch in Django projects Seamless integration with Django, auto-syncing of models and documents Limited to Django, more abstraction layers

Basic CRUD Operations with elasticsearch-py

The official low-level Python client elasticsearch-py allows performing all basic and advanced Elasticsearch operations. Let's start with some basic operation examples:

Import Dependencies and Initialize Client

from elasticsearch import Elasticsearch
import configparser
import warnings

warnings.filterwarnings("ignore")

def init_es_client(config_path='./conf/config.ini'):
    # Initialize parser
    config = configparser.ConfigParser()
    # Read config
    config.read(config_path)
    # Retrieve Elasticsearch configuration
    es_host = config.get('elasticsearch', 'ES_HOST')
    es_user = config.get('elasticsearch', 'ES_USER')
    es_password = config.get('elasticsearch', 'ES_PASSWORD')
  
    es = Elasticsearch(
        hosts=[es_host],
        basic_auth=(es_user, es_password),
        verify_certs=False,
        ca_certs='conf/http_ca.crt'
    )
    return es

Create Index

def create_index(es, index_name="test-index"):
    if not es.indices.exists(index=index_name):
        es.indices.create(index=index_name)

Define Mapping

def define_mapping(es, index_name="test-index"):
    mapping = {
        "mappings": {
            "properties": {
                "name": {"type": "text"},
                "age": {"type": "integer"},                "email": {"type": "keyword"}
            }
        }
    }
    es.indices.create(index=index_name, body=mapping, ignore=400)

Insert, Update, and Delete Documents

def insert_document(es, index_name="test-index", doc_id=None, document=None):
    es.index(index=index_name, id=doc_id, document=document)

def update_document(es, index_name="test-index", doc_id=None, updated_doc=None):
    es.update(index=index_name, id=doc_id, body={"doc": updated_doc})

def delete_document(es, index_name="test-index", doc_id=None):
    es.delete(index=index_name, id=doc_id)

Search Documents

def search_documents(es, index_name="test-index", query=None):
    return es.search(index=index_name, body=query)

Full Example in the main Function

def main():
    # Initialization
    es = init_es_client()
    # Index creation and mapping
    create_index(es)
    define_mapping(es)
    # Document operations
    doc = {"name": "John Doe", "age": 30, "email": "john.doe@example.com
    insert_document(es, "test-index", doc_id="1", document=doc)
    # Update document
    update_document(es, "test-index", "1", {"age": 31})
    # Search document
    query = {"query": {"match": {"name": "John Doe"}}}
    results = search_documents(es, "test-index", query)
    print(results)
    # Delete document
    delete_document(es, "test-index", "1")
    
if __name__ == "__main__":
    main()

This code demonstrates the process of interacting with Elasticsearch using the elasticsearch-py library. From creating and deleting indices, defining mappings, and performing CRUD operations on documents to executing simple searches, it encapsulates a wide range of fundamental Elasticsearch operations.

Transitioning to elasticsearch-dsl for Simplified Operations

While elasticsearch-py provides comprehensive access to Elasticsearch capabilities, elasticsearch-dsl simplifies complex search query construction, showcasing a more Pythonic approach. The following snippets illustrate how to utilize elasticsearch-dsl effectively:

Initialize Elasticsearch DSL Client

from elasticsearch_dsl import Document, Text, Integer, connections, Search

def init_es_client_dsl(config_path='./conf/config.ini'):
    config = configparser.ConfigParser()
    config.read(config_path)
    es_host = config.get('elasticsearch', 'ES_HOST')
    es_user = config.get('elasticsearch', 'ES_USER')
    es_password = config.get('elasticsearch', 'ES_PASSWORD')
  
    connections.create_connection(
        hosts=[es_host],
        http_auth=(es_user, es_password),
        verify_certs=False
    )

Define a Document and Create an Index

class MyDocument(Document):
    name = Text()
    age = Integer()
    email = Text()
    
    class Index:
        name = 'test-index-dsl'
        
def create_index_dsl():
    MyDocument.init()

Insert and Update Documents

def insert_document_dsl(document):
    doc = MyDocument(meta={'id': document.get('id', None)}, **document)
    doc.save()

def update_document_dsl(doc_id, updated_doc):
    doc = MyDocument.get(id=doc_id)
    for key, value in updated_doc.items():
        setattr(doc, key, value)
    doc.save()

Delete and Search Documents

def delete_document_dsl(doc_id):
    doc = MyDocument.get(id=doc_id)
    doc.delete()

def search_documents_dsl(query):
    s = Search(index="test-index-dsl").query("match", name=query)
    response = s.execute()
    return response

Conclusion

Whether you choose elasticsearch-py for its direct API access and flexibility or prefer the more abstracted and Pythonic elasticsearch-dsl for simplified query construction, both clients offer powerful ways to interact with Elasticsearch in Python projects. Leveraging these in conjunction with Alibaba Cloud Elasticsearch can significantly enhance your search and data analytics capabilities.

Ready to start your journey with Elasticsearch on Alibaba Cloud? Explore our tailored Cloud solutions and services to take the first step towards transforming your data into a visual masterpiece. Click here, Embark on Your 30-Day Free Trial

0 1 0
Share on

Data Geek

99 posts | 4 followers

You may also like

Comments

Data Geek

99 posts | 4 followers

Related Products