All Products
Search
Document Center

OpenSearch:Experience center

Last Updated:Nov 26, 2024

You can use various services such as document parsing, image text extraction, and document splitting in a visualized manner in the experience center of the Search Development Workbench console. This helps you quickly evaluate whether the services meet your business requirements.

Features

The following table describes the services that are provided by the experience center.

Category

Description

Document Content Parsing

Document Content Parsing Service(ops-document-analyze-001) provides a general-purpose document parsing service. You can use this service to extract logical structures, such as titles and paragraphs, from non-structured documents, such as text, tables, and images, to generate structured data.

Image Content Parsing

  • Image Content Recognition Service 001(ops-image-analyze-vlm-001) allows you to parse the content of images based on multimodal large language models (LLMs). You can also use the service to parse the text in images and use the parsed text for image retrieval and conversational search.

  • Image Text Recognition Service 001(ops-image-analyze-ocr-001) allows you to use the optical character recognition (OCR) feature to recognize the text in images and use the recognized text for image retrieval and conversational search.

Document Slice

Common Document Slicing Service(ops-document-split-001) provides a general-purpose text splitting service. You can use this service to split structured data in the HTML, MARKDOWN, and TXT formats based on paragraphs, semantics, and specific rules. You can also extract code, images, and tables from rich text.

Text vectorization

  • OpenSearch text vectorization service -001(ops-text-embedding-001) provides a text vectorization service that supports more than 40 languages. The input text can be up to 300 tokens in length, and the dimension of the generated vectors is 1,536.

  • OpenSearch Universal Text Vectorization Service -002(ops-text-embedding-002) provides a text vectorization service that supports more than 100 languages. The input text can be up to 8,192 tokens in length, and the dimension of the generated vectors is 1,024.

  • OpenSearch text vectorization service-Chinese -001(ops-text-embedding-zh-001) provides a text vectorization service for Chinese text. The input text can be up to 1,024 tokens in length, and the dimension of the generated vectors is 768.

  • OpenSearch text vectorization service-English -001(ops-text-embedding-en-001) provides a text vectorization service for English text. The input text can be up to 512 tokens in length, and the dimension of the generated vectors is 768.

Text sparse vectorization

This service converts text data into sparse vectors that occupy less storage space. You can use sparse vectors to express keywords and the information about frequently used terms. You can perform a hybrid search by using sparse and dense vectors to improve the retrieval performance.

OpenSearch text sparse vectorization service-generic(ops-text-sparse-embedding-001) provides a text vectorization service that supports more than 100 languages. The input text can be up to 8,192 tokens in length.

Query Analysis

This service provides the content analysis service for queries based on LLMs and the Natural Language Processing (NLP) capabilities to understand the intent of users, extend similar questions, and convert questions in natural languages into SQL statements. This improves the performance of conversational search in retrieval-augmented generation (RAG) scenarios.

Query Analysis Service 001(ops-query-analyze-001) supports the LLM-based general-purpose analysis of queries to understand the intent of users and extend similar questions.

Sorting Service

  • BGE rearrangement model(ops-bge-reranker-larger) provides a general-purpose document scoring service. This service scores documents based on the relevance between queries and document content, sorts documents in descending order based on scores, and then returns the scores. The service supports Chinese and English. The input text can be up to 512 tokens in length, which includes the length of queries and documents.

  • OpenSearch Text Reranking - 001(ops-text-reranker-001) is developed by the OpenSearch team and trained by using the datasets of multiple industries to provide a high-quality reranking service. This service can sort documents in descending order based on the relevance between queries and document content. The service supports Chinese and English. The input text can be up to 512 tokens in length, which includes the length of queries and documents.

Large model

  • OpenSearch-all meaning thousands of questions-Turbo(ops-qwen-turbo) is a model developed based on the qwen-turbo model. This model is enhanced after supervised fine-tuning, and the probability that the model returns biased content is reduced.

  • Tongyi Thousand Questions-Turbo(qwen-turbo) is a Qwen ultra-large language model that supports multiple input languages such as Chinese and English. For more information, see Model overview.

  • Tongyi Thousand Questions-Plus(qwen-plus) is an enhanced version of the qwen-turbo model that supports multiple input languages such as Chinese and English. For more information, see Model overview.

  • Tongyi Thousand Questions-Max(qwen-max) is a Qwen ultra-large language model that supports hundreds of billions of parameters and multiple input languages such as Chinese and English. For more information, see Model overview.

Procedure

Use the document parsing service

  1. Log on to the OpenSearch console.

  2. In the top navigation bar, select a region. In the upper-left corner, select Search Development Workbench.

  3. In the left-side navigation pane, click Experience Center.

  4. On the Experience Center page, select Document Content Parsing(document-analyze) from the Service Category drop-down list and select a service from the Experience Services drop-down list.

  5. Set the Experience Data parameter to Sample data or My data. If you set the Experience Data parameter to My data, you can click Manage data to upload your data. Supported file types include TXT, PDF, HTML, DOC, DOCX, PPT, and PPTX. Each file can be up to 20 MB in size.

    • Upload local files: Uploaded files are automatically cleared after seven days. The console does not permanently store your data.

    • Provide the URLs of files and specify the file type: You can provide multiple file URLs. Each file URL occupies a separate line.

      Note

      If you select an incorrect file type, the data that you upload fails to be parsed. To prevent document parsing failures, you must select a correct file type based on the file data.

      Important

      When you import files by using URLs, make sure that your operations comply with laws, regulations, and the management norms of the relevant platform and do not infringe on the legitimate rights or interests of right holders. You shall be solely responsible for any violations of the preceding requirements. As a tool provider, Search Development Workbench does not assume any responsibility for any data parsing or data download that you perform.

  6. If you want to use your data, select an uploaded file or a URL from the drop-down list.

  7. Click Get Results. The system calls the document parsing service to parse the file. You can view the results on the following tabs:

    • Results: displays the parsing progress and results.

    • Result source code: displays the result response code. You can click Copy Code to copy the code or click Download File to download the code to your computer.

    • Sample code: allows you to view and download the sample code for calling the document parsing service.

Use the document splitting service

  1. Log on to the OpenSearch console.

  2. In the top navigation bar, select a region. In the upper-left corner, select Search Development Workbench.

  3. In the left-side navigation pane, click Experience Center.

  4. On the Experience Center page, select Document Slice(document-split) from the Service Category drop-down list and select a service from the Experience Services drop-down list.

  5. Set the Experience Data parameter to Sample data or My data. If you set the Experience Data parameter to My data, you can enter your data in the editor. Supported data formats include TXT, HTML, and MARKDOWN.

    Note

    If you select an incorrect data format, the data that you enter fails to be parsed. To prevent document parsing failures, you must use a correct data format.

  6. Configure the Maximum Slice Length parameter. Unit: token. Default value: 300. The maximum value is 1024.

  7. Click Get Results. The system calls the document splitting service to split the data. You can view the results on the following tabs:

    • Results: displays the splitting progress and results.

    • Result source code: displays the result response code. You can click Copy Code to copy the code or click Download File to download the code to your computer.

    • Sample code: allows you to view and download the sample code for calling the document splitting service.

Use the text vectorization or text sparse vectorization service

  1. Log on to the OpenSearch console.

  2. In the top navigation bar, select a region. In the upper-left corner, select Search Development Workbench.

  3. In the left-side navigation pane, click Experience Center.

  4. On the Experience Center page, select Text vectorization(text-embedding) or Text sparse vectorization(text-sparse-embedding) from the Service Category drop-down list and select a service from the Experience Services drop-down list.

  5. Set the Content Type parameter to document or query.

  6. Add text groups or directly enter JSON-formatted data.

  7. Click Get Results. The system calls the text vectorization or text sparse vectorization service to vectorize the text. You can view the results on the following tabs:

    • Results: displays the vectorization progress and results.

    • Result source code: displays the result response code. You can click Copy Code to copy the code or click Download File to download the code to your computer.

    • Sample code: allows you to view and download the sample code for calling the text vectorization or text sparse vectorization service.

Use the sorting service

  1. Log on to the OpenSearch console.

  2. In the top navigation bar, select a region. In the upper-left corner, select Search Development Workbench.

  3. In the left-side navigation pane, click Experience Center.

  4. On the Experience Center page, select Sorting Service(ranker) from the Service Category drop-down list and select a service from the Experience Services drop-down list.

  5. Set the Experience Data parameter to Sample data or My data.

  6. Enter information in the Query field.

  7. Click Get Results. The system calls the sorting service to sort the documents based on the relevance between the query and document content, and returns the scoring results. You can view the results on the following tabs:

    • Results: displays the sorting and scoring results.

    • Result source code: displays the result response code. You can click Copy Code to copy the code or click Download File to download the code to your computer.

    • Sample code: allows you to view and download the sample code for calling the sorting service.

Use an LLM

  1. Log on to the OpenSearch console.

  2. In the top navigation bar, select a region. In the upper-left corner, select Search Development Workbench.

  3. In the left-side navigation pane, click Experience Center.

  4. On the Experience Center page, select Large model(text-generation) from the Service Category drop-down list and select a service from the Experience Services drop-down list.

  5. Enter your question and click the Submit icon. The LLM analyzes the question and provides an answer.

    Important

    All answers are generated by AI models. The content generated by the AI models may be inaccurate or incomplete and does not reflect the attitudes or opinions of Alibaba Cloud.

    The Results tab displays the number of input and output tokens that are consumed in this conversation. You can click Clear Conversation to delete this conversation.

Use the image content parsing service

  1. Log on to the OpenSearch console.

  2. In the top navigation bar, select a region. In the upper-left corner, select Search Development Workbench.

  3. In the left-side navigation pane, click Experience Center.

  4. On the Experience Center page, select Image Content Parsing(image-analyze) from the Service Category drop-down list and select a service from the Experience Services drop-down list.

  5. Set the Experience Data parameter to Sample data or My data.

  6. Click Get Results. The system calls the image content parsing service to analyze the image content and generate results or identify and export the key information of the image. You can view the results on the following tabs:

    • Result: displays the content parsing results.

    • Result source code: displays the result response code. You can click Copy Code to copy the code or click Download File to download the code to your computer.

    • Sample code: allows you to view and download the sample code for calling the image content parsing service.

Use the query analysis service

  1. Log on to the OpenSearch console.

  2. In the top navigation bar, select a region. In the upper-left corner, select Search Development Workbench.

  3. In the left-side navigation pane, click Experience Center.

  4. On the Experience Center page, select Query Analysis(query-analyze) from the Service Category drop-down list and select a service from the Experience Services drop-down list.

  5. Directly enter your question in the Query field. Alternatively, construct a multi-round conversation in the Historical Message section and then enter your question in the Query field. The model combines the multi-round conversation and question to identify your query intent.

    Turn on Show NL2SQL and select a created service configuration from the Service Configuration drop-down list. You can enter your question in a natural language. The NL2SQL service converts your question in a natural language into SQL statements.

  6. Click Get Results. You can view the results on the following tabs:

    • Results: displays the query analysis results.

    • Result source code: displays the result response code. You can click Copy Code to copy the code or click Download File to download the code to your computer.

    • Sample code: allows you to view and download the sample code for calling the query analysis service.