Leverage the advanced capabilities of Elasticsearch Machine Learning by integrating third-party training models seamlessly. This guide walks you through how to use Elastic Eland to upload cutting-edge Hugging Face models to Alibaba Cloud Elasticsearch, enabling AI and NLP advancements with ease.
For those eager to enhance their Elasticsearch data analysis and machine learning tasks, Elastic Eland serves as a bridge, combining big data processing with the Python data science ecosystem. Take advantage of Elastic Eland to convert Hugging Face's pre-trained models into TorchScript format, making your AI applications ready for environments without a Python interpreter.
Explore the capabilities of Elastic Eland in the official documentation.
Success with Elastic Eland on Alibaba Cloud Elasticsearch clusters hinges on certain version compatibilities. Versions from 7.11 or later support Elastic Eland, with enhanced features offered in later versions. For optimal functionality, consider using Elastic Eland with Elasticsearch clusters version 8.3 or later. Ensure that the major version of your Elasticsearch cluster corresponds with that of your Elastic Eland (e.g., V8.X with Eland 8.X). Supported Python versions include 3.8, 3.9, and 3.10, with Pandas 1.5.3 also being supported.
Before starting, set up an Elastic Compute Service (ECS) instance with a Python environment—our example utilizes Python 3.10.12. Note that access to Hugging Face is essential, and model upload capability is provided by the Elasticsearch Platinum editions, available through Alibaba Cloud's subscription.
Please Click here, Embark on Your 30-Day Free Trial !!
An Elastic Compute Service (ECS) instance is created, and a Python environment is configured for the ECS instance. In this example, Python 3.10.12 is used. For information about how to create an ECS instance, see Get started with Linux instances.
1)When you create an ECS instance, you can select a Ubuntu 22 image, which provides Python 3.10. If you select another image, you must manually download and configure a Python environment. For more information, see Python official documentation.
2)You must make sure that the ECS instance can access Hugging Face.You can use Elastic Eland to upload Hugging Face models. Models in other libraries may not be able to be uploaded by using Elastic Eland.
An Alibaba Cloud Elasticsearch cluster is created. In this example, an Alibaba Cloud Elasticsearch V8.9 cluster is used. For information about how to create an Alibaba Cloud Elasticsearch cluster, see Create an Alibaba Cloud Elasticsearch cluster.
1)You must add the public or private IP address of the ECS instance to the IP address whitelist of the Elasticsearch cluster to allow the ECS instance to access the Elasticsearch cluster. For more information, see Configure a public or private IP address whitelist for an Elasticsearch cluster.
2)The feature of using Elastic Eland to upload models is provided by the open source Elasticsearch Platinum edition and the open source Elasticsearch Enterprise edition. Alibaba Cloud subscribes to Elasticsearch of the Platinum edition. You can directly upload models to Alibaba Cloud Elasticsearch.
3)The usage of Elasticsearch Machine Learning may differ between Elasticsearch clusters of different major versions. For information about the usage of Elasticsearch Machine Learning in Elasticsearch clusters of versions other than V8.9, see Machine Learning.
Follow these detailed instructions to efficiently upload and deploy your AI models to Alibaba Cloud Elasticsearch, ensuring a smooth transition into a more intelligent search and analysis environment:
Confirm that you're running compatible versions with the following commands:
pip3 install eland==8.7.0
pip3 install torch==1.11.0 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
pip install eland[pytorch]
To accommodate NLP model import using PyTorch 1.13.1 or earlier, you may need to adjust your install command accordingly.
Run the following command to check whether the version of Pandas is 1.5.3:
pip show pandas
If the version of Pandas is not 1.5.3, run the following command to install Pandas 1.5.3:
pip install pandas==1.5.3
Upload models directly to Alibaba Cloud Elasticsearch using commands tailored to online modes or download and then upload in offline modes.
Run the following command to upload a model in a Hugging Face library to Alibaba Cloud Elasticsearch:
eland_import_hub_model \
--url 'http://es-cn-w*****.ES.aliyuncs.com:9200' \
--hub-model-id 'madhurjindal/autonlp-Gibberish-Detector-492513457' \
--task-type text_classification \
--es-username yourusername \
--es-password yourpassword \
--es-model-id your-es-model-id-name \
Download a model in the huggingface_hub library to an on-premises machine and then upload the model to Alibaba Cloud Elasticsearch.
1)Run the following command in the ECS instance to download the huggingface_hub library:
# Download the huggingface_hub library. huggingface_hub is a library provided by Hugging Face. This library is used to interact with model libraries of Hugging Face, and download, upload, and list models and resources in other libraries.
pip install huggingface_hub
python3
# Import the snapshot_download function to the Python interpreter.
from huggingface_hub import snapshot_download
# Use the snapshot_download function to download the snapshot of a storage library named FlagAlpha/Llama2-Chinese-13b-Chat.
snapshot_download(repo_id="madhurjindal/autonlp-Gibberish-Detector-492513457")
2)Run the following command in the ECS instance to upload the model file to Alibaba Cloud Elasticsearch:
The following table describes some parameters in the preceding commands.
eland_import_hub_model \
--url 'http://es-cn-w*****.ES.aliyuncs.com:9200' \
--hub-model-id '/model/.cache/huggingface/hub/models--madhurjindal--autonlp-Gibberish-Detector-492513457/snapshots/c068f552cdee957e45d8773db9f7158d43902244' \
--task-type text_classification \
--es-username yourusername \
--es-password yourpassword \
--es-model-id madhurjindal-autonlp-gibberish-detector-492513457-offline \
Parameter | Description |
---|---|
repo_id | The ID of the model in the Hugging Face Hub.Example: madhurjindal/autonlp-Gibberish-Detector-492513457. |
url | The URL of the Elasticsearch cluster.Example: http://es-cn-w*.ES.aliyuncs.com:9200. |
hub-model-id | Online upload: The ID of the model in the Hugging Face Hub.Example: madhurjindal/autonlp-Gibberish-Detector-492513457.Offline upload: The path of the model file in ECS.Example: /model/.cache/huggingface/hub/models--madhurjindal--autonlp-Gibberish-Detector-492513457/snapshots/c068f552cdee957e45d8773db9f7158d43902244. |
task-type | The task type used by the model. Different models support different task types.Hugging Face models support the following task types:question_answeringzero_shot_classificationtext_classificationfill_masktext_embeddingtext_expansiontext_similarityner |
es-username | The username of the Elasticsearch cluster. |
es-password | The password of the Elasticsearch cluster. |
es-model-id | The model ID that is used after the model is uploaded to Elasticsearch. You can specify a model ID based on your business requirements.NoteThe model ID that you specify cannot contain uppercase letters.Example: madhurjindal-autonlp-gibberish-detector-492513457-offline. |
Access Model Management from the Kibana console to initiate your models. Synchronization options are available for aligning your jobs and trained models.
1)Log on to the Kibana console of the Elasticsearch cluster. For more information, see Log on to the Kibana console.
2)Click the icon in the upper-left corner of the Kibana console. In the left-side navigation pane, choose Analytics > Machine Learning.
3)In the left-side navigation pane of the page that appears, choose Model Management > Trained Models.
4)Optional. In the upper part of the Trained Models page, click Synchronize your jobs and trained models. In the panel that appears, click Synchronize.
5)On the Trained Models page, find the uploaded model and click the icon in the Actions column to start the model.
6)In the dialog box that appears, configure the model and click Start. If a message indicating that the model is started is displayed in the lower-right corner of the page, the model is deployed.Note If the model cannot be started, the memory of the Elasticsearch cluster may be insufficient. You can start the model again after you upgrade the configuration of the Elasticsearch cluster. In the dialog box that prompts you about the failure, you can click View complete error message to view the failure cause.
1)On the Trained Models page, find the deployed model, click the icon in the Actions column, and then click Test model.
2)In the Test trained model panel, test the model and check whether the output result meets your expectations.
pip3 install eland==8.7.0
pip3 install torch==1.11.0 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
pip install eland[pytorch]
Note If you want to use PyTorch 1.13.1 or earlier to import a natural language processing (NLP) model, you can run the pip install torch==1.13.1 command to install PyTorch whose version meets your business requirements.
pip show pandas
If the version of Pandas is not 1.5.3, run the following command to install Pandas 1.5.3:
pip install pandas==1.5.3
eland_import_hub_model \
--url 'http://es-cn-w*****.ES.aliyuncs.com:9200' \
--hub-model-id 'madhurjindal/autonlp-Gibberish-Detector-492513457' \
--task-type text_classification \
--es-username yourusername \
--es-password yourpassword \
--es-model-id your-es-model-id-name \
# Download the huggingface_hub library. huggingface_hub is a library provided by Hugging Face. This library is used to interact with model libraries of Hugging Face, and download, upload, and list models and resources in other libraries.
pip install huggingface_hub
python3
# Import the snapshot_download function to the Python interpreter.
from huggingface_hub import snapshot_download
# Use the snapshot_download function to download the snapshot of a storage library named FlagAlpha/Llama2-Chinese-13b-Chat.
snapshot_download(repo_id="madhurjindal/autonlp-Gibberish-Detector-492513457")
eland_import_hub_model \
--url 'http://es-cn-w*****.ES.aliyuncs.com:9200' \
--hub-model-id '/model/.cache/huggingface/hub/models--madhurjindal--autonlp-Gibberish-Detector-492513457/snapshots/c068f552cdee957e45d8773db9f7158d43902244' \
--task-type text_classification \
--es-username yourusername \
--es-password yourpassword \
--es-model-id madhurjindal-autonlp-gibberish-detector-492513457-offline \
Search and Analytics Service Elasticsearch Version: Alibaba Cloud Elasticsearch is a fully managed Elasticsearch cloud service built on the open-source Elasticsearch, supporting out-of-the-box functionality and pay-as-you-go while being 100% compatible with open-source features. Not only does it provide the cloud-ready components of the Elastic Stack, including Elasticsearch, Logstash, Kibana, and Beats, but it also partners with Elastic to offer the free X-Pack (Platinum level advanced features) commercial plugin. This integration includes advanced features such as security, SQL, machine learning, alerting, and monitoring, and is widely used in scenarios such as real-time log analysis, information retrieval, and multi-dimensional data querying and statistical analysis.
For more information about Elasticsearch, please visit https://www.alibabacloud.com/en/product/elasticsearch.
Alibaba Cloud Unleashes New AI Search Solution with Elasticsearch 8.9 Release
Identify Gibberish Content with NLP Model and Elasticsearch for Social Media Data Analysis
Data Geek - April 17, 2024
Data Geek - April 19, 2024
Data Geek - April 11, 2024
Data Geek - April 8, 2024
Data Geek - April 12, 2024
Alibaba Cloud MaxCompute - March 24, 2021
Alibaba Cloud Elasticsearch helps users easy to build AI-powered search applications seamlessly integrated with large language models, and featuring for the enterprise: robust access control, security monitoring, and automatic updates.
Learn MoreA one-stop generative AI platform to build intelligent applications that understand your business, based on Qwen model series such as Qwen-Max and other popular models
Learn MoreA dialogue platform that enables smart dialog (based on natural language processing) through a range of dialogue-enabling clients
Learn MoreTop-performance foundation models from Alibaba Cloud
Learn MoreMore Posts by Data Geek