×
Community Blog Seamless Integration of Third-party NLP Models into Alibaba Cloud Elasticsearch with Elastic Eland

Seamless Integration of Third-party NLP Models into Alibaba Cloud Elasticsearch with Elastic Eland

This guide walks you through how to use Elastic Eland to upload cutting-edge Hugging Face models to Alibaba Cloud Elasticsearch, enabling AI and NLP advancements with ease.

Leverage the advanced capabilities of Elasticsearch Machine Learning by integrating third-party training models seamlessly. This guide walks you through how to use Elastic Eland to upload cutting-edge Hugging Face models to Alibaba Cloud Elasticsearch, enabling AI and NLP advancements with ease.

Search Meets Python Data Science with Elastic Eland

For those eager to enhance their Elasticsearch data analysis and machine learning tasks, Elastic Eland serves as a bridge, combining big data processing with the Python data science ecosystem. Take advantage of Elastic Eland to convert Hugging Face's pre-trained models into TorchScript format, making your AI applications ready for environments without a Python interpreter.

Explore the capabilities of Elastic Eland in the official documentation.

Ensure Version Compatibility

Success with Elastic Eland on Alibaba Cloud Elasticsearch clusters hinges on certain version compatibilities. Versions from 7.11 or later support Elastic Eland, with enhanced features offered in later versions. For optimal functionality, consider using Elastic Eland with Elasticsearch clusters version 8.3 or later. Ensure that the major version of your Elasticsearch cluster corresponds with that of your Elastic Eland (e.g., V8.X with Eland 8.X). Supported Python versions include 3.8, 3.9, and 3.10, with Pandas 1.5.3 also being supported.

Preparing Your Cloud Environment

Before starting, set up an Elastic Compute Service (ECS) instance with a Python environment—our example utilizes Python 3.10.12. Note that access to Hugging Face is essential, and model upload capability is provided by the Elasticsearch Platinum editions, available through Alibaba Cloud's subscription.

Please Click here, Embark on Your 30-Day Free Trial !!

An Elastic Compute Service (ECS) instance is created, and a Python environment is configured for the ECS instance. In this example, Python 3.10.12 is used. For information about how to create an ECS instance, see Get started with Linux instances.

1)When you create an ECS instance, you can select a Ubuntu 22 image, which provides Python 3.10. If you select another image, you must manually download and configure a Python environment. For more information, see Python official documentation.

2)You must make sure that the ECS instance can access Hugging Face.You can use Elastic Eland to upload Hugging Face models. Models in other libraries may not be able to be uploaded by using Elastic Eland.

An Alibaba Cloud Elasticsearch cluster is created. In this example, an Alibaba Cloud Elasticsearch V8.9 cluster is used. For information about how to create an Alibaba Cloud Elasticsearch cluster, see Create an Alibaba Cloud Elasticsearch cluster.

1)You must add the public or private IP address of the ECS instance to the IP address whitelist of the Elasticsearch cluster to allow the ECS instance to access the Elasticsearch cluster. For more information, see Configure a public or private IP address whitelist for an Elasticsearch cluster.

2)The feature of using Elastic Eland to upload models is provided by the open source Elasticsearch Platinum edition and the open source Elasticsearch Enterprise edition. Alibaba Cloud subscribes to Elasticsearch of the Platinum edition. You can directly upload models to Alibaba Cloud Elasticsearch.

3)The usage of Elasticsearch Machine Learning may differ between Elasticsearch clusters of different major versions. For information about the usage of Elasticsearch Machine Learning in Elasticsearch clusters of versions other than V8.9, see Machine Learning.

Step-by-Step Guide to Model Upload and Deployment

Follow these detailed instructions to efficiently upload and deploy your AI models to Alibaba Cloud Elasticsearch, ensuring a smooth transition into a more intelligent search and analysis environment:

Step 1: Install Compatible Versions of Elastic Eland and PyTorch

Confirm that you're running compatible versions with the following commands:

pip3 install eland==8.7.0
pip3 install torch==1.11.0 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
pip install eland[pytorch]

To accommodate NLP model import using PyTorch 1.13.1 or earlier, you may need to adjust your install command accordingly.

Step 2: Verify and Upgrade Pandas

Run the following command to check whether the version of Pandas is 1.5.3:

pip show pandas

If the version of Pandas is not 1.5.3, run the following command to install Pandas 1.5.3:

pip install pandas==1.5.3

Step 3: Uploading Hugging Face Models to Alibaba Cloud Elasticsearch

Upload models directly to Alibaba Cloud Elasticsearch using commands tailored to online modes or download and then upload in offline modes.

Method 1: Upload a model in online mode

Run the following command to upload a model in a Hugging Face library to Alibaba Cloud Elasticsearch:

eland_import_hub_model \
      --url 'http://es-cn-w*****.ES.aliyuncs.com:9200' \
      --hub-model-id 'madhurjindal/autonlp-Gibberish-Detector-492513457' \
      --task-type text_classification \
      --es-username yourusername \
      --es-password  yourpassword \
      --es-model-id your-es-model-id-name \

Method 2: Upload a model in offline mode

Download a model in the huggingface_hub library to an on-premises machine and then upload the model to Alibaba Cloud Elasticsearch.

1)Run the following command in the ECS instance to download the huggingface_hub library:

# Download the huggingface_hub library. huggingface_hub is a library provided by Hugging Face. This library is used to interact with model libraries of Hugging Face, and download, upload, and list models and resources in other libraries. 
pip install huggingface_hub
python3

# Import the snapshot_download function to the Python interpreter.
from huggingface_hub import snapshot_download
# Use the snapshot_download function to download the snapshot of a storage library named FlagAlpha/Llama2-Chinese-13b-Chat.
snapshot_download(repo_id="madhurjindal/autonlp-Gibberish-Detector-492513457")

2)Run the following command in the ECS instance to upload the model file to Alibaba Cloud Elasticsearch:

The following table describes some parameters in the preceding commands.

eland_import_hub_model \
      --url 'http://es-cn-w*****.ES.aliyuncs.com:9200' \
      --hub-model-id '/model/.cache/huggingface/hub/models--madhurjindal--autonlp-Gibberish-Detector-492513457/snapshots/c068f552cdee957e45d8773db9f7158d43902244' \
      --task-type text_classification \
      --es-username yourusername \
      --es-password  yourpassword \
      --es-model-id madhurjindal-autonlp-gibberish-detector-492513457-offline \
Parameter Description
repo_id The ID of the model in the Hugging Face Hub.Example: madhurjindal/autonlp-Gibberish-Detector-492513457.
url The URL of the Elasticsearch cluster.Example: http://es-cn-w*.ES.aliyuncs.com:9200.
hub-model-id Online upload: The ID of the model in the Hugging Face Hub.Example: madhurjindal/autonlp-Gibberish-Detector-492513457.Offline upload: The path of the model file in ECS.Example: /model/.cache/huggingface/hub/models--madhurjindal--autonlp-Gibberish-Detector-492513457/snapshots/c068f552cdee957e45d8773db9f7158d43902244.
task-type The task type used by the model. Different models support different task types.Hugging Face models support the following task types:question_answeringzero_shot_classificationtext_classificationfill_masktext_embeddingtext_expansiontext_similarityner
es-username The username of the Elasticsearch cluster.
es-password The password of the Elasticsearch cluster.
es-model-id The model ID that is used after the model is uploaded to Elasticsearch. You can specify a model ID based on your business requirements.NoteThe model ID that you specify cannot contain uppercase letters.Example: madhurjindal-autonlp-gibberish-detector-492513457-offline.

Step 4: Deploying Your Model in Kibana

Access Model Management from the Kibana console to initiate your models. Synchronization options are available for aligning your jobs and trained models.

1)Log on to the Kibana console of the Elasticsearch cluster. For more information, see Log on to the Kibana console.

2)Click the img icon in the upper-left corner of the Kibana console. In the left-side navigation pane, choose Analytics > Machine Learning.

3)In the left-side navigation pane of the page that appears, choose Model Management > Trained Models.

4)Optional. In the upper part of the Trained Models page, click Synchronize your jobs and trained models. In the panel that appears, click Synchronize.

5)On the Trained Models page, find the uploaded model and click the img icon in the Actions column to start the model.

6)In the dialog box that appears, configure the model and click Start. If a message indicating that the model is started is displayed in the lower-right corner of the page, the model is deployed.Note If the model cannot be started, the memory of the Elasticsearch cluster may be insufficient. You can start the model again after you upgrade the configuration of the Elasticsearch cluster. In the dialog box that prompts you about the failure, you can click View complete error message to view the failure cause.

Step 5: Testing and Verifying the Model

1)On the Trained Models page, find the deployed model, click the img icon in the Actions column, and then click Test model.

2)In the Test trained model panel, test the model and check whether the output result meets your expectations.

pip3 install eland==8.7.0
pip3 install torch==1.11.0 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
pip install eland[pytorch]

Note If you want to use PyTorch 1.13.1 or earlier to import a natural language processing (NLP) model, you can run the pip install torch==1.13.1 command to install PyTorch whose version meets your business requirements.

pip show pandas

If the version of Pandas is not 1.5.3, run the following command to install Pandas 1.5.3:

pip install pandas==1.5.3
eland_import_hub_model \
      --url 'http://es-cn-w*****.ES.aliyuncs.com:9200' \
      --hub-model-id 'madhurjindal/autonlp-Gibberish-Detector-492513457' \
      --task-type text_classification \
      --es-username yourusername \
      --es-password  yourpassword \
      --es-model-id your-es-model-id-name \
# Download the huggingface_hub library. huggingface_hub is a library provided by Hugging Face. This library is used to interact with model libraries of Hugging Face, and download, upload, and list models and resources in other libraries. 
pip install huggingface_hub
python3

# Import the snapshot_download function to the Python interpreter.
from huggingface_hub import snapshot_download
# Use the snapshot_download function to download the snapshot of a storage library named FlagAlpha/Llama2-Chinese-13b-Chat.
snapshot_download(repo_id="madhurjindal/autonlp-Gibberish-Detector-492513457")
eland_import_hub_model \
      --url 'http://es-cn-w*****.ES.aliyuncs.com:9200' \
      --hub-model-id '/model/.cache/huggingface/hub/models--madhurjindal--autonlp-Gibberish-Detector-492513457/snapshots/c068f552cdee957e45d8773db9f7158d43902244' \
      --task-type text_classification \
      --es-username yourusername \
      --es-password  yourpassword \
      --es-model-id madhurjindal-autonlp-gibberish-detector-492513457-offline \

30-Day Free Trial: Help You Implement Elasticsearch on Cloud

Search and Analytics Service Elasticsearch Version: Alibaba Cloud Elasticsearch is a fully managed Elasticsearch cloud service built on the open-source Elasticsearch, supporting out-of-the-box functionality and pay-as-you-go while being 100% compatible with open-source features. Not only does it provide the cloud-ready components of the Elastic Stack, including Elasticsearch, Logstash, Kibana, and Beats, but it also partners with Elastic to offer the free X-Pack (Platinum level advanced features) commercial plugin. This integration includes advanced features such as security, SQL, machine learning, alerting, and monitoring, and is widely used in scenarios such as real-time log analysis, information retrieval, and multi-dimensional data querying and statistical analysis.

For more information about Elasticsearch, please visit https://www.alibabacloud.com/en/product/elasticsearch.

Please Click here, Embark on Your 30-Day Free Trial !!

0 1 0
Share on

Data Geek

100 posts | 4 followers

You may also like

Comments

Data Geek

100 posts | 4 followers

Related Products