All Products
Search
Document Center

OpenSearch:Experience center

Last Updated:Aug 29, 2024

Explore a range of services including document parsing, image content extraction, and document slicing at the Experience Center, using a visual interface to quickly determine if these services align with your business requirements.

Function introduction

The Experience Center offers the following services:

Service Category

Service Description

Document Content Parsing

The Document Content Parsing Service (ops-document-analyze-001) extracts logical structures such as titles and segments from unstructured documents (text, tables, images, etc.), converting them into a structured format.

Image Content Parsing

  • The Image Content Understanding Service (ops-image-analyze-vlm-001) leverages a multimodal large model to parse and comprehend image content, performing OCR for image retrieval and Q&A applications.

  • The Image Text Recognition Service (ops-image-analyze-ocr-001) employs OCR technology to detect text within images for use in image retrieval and Q&A scenarios.

Document Slicing

The Document Slicing Service (ops-document-split-001) facilitates text slicing, enabling the segmentation of structured data in HTML, Markdown, and txt formats based on paragraphs, semantics, and predefined rules. It also extracts code, images, and tables from rich text documents.

Text Vectorization

  • The OpenSearch Text Vectorization Service-001 (ops-text-embedding-001) offers multilingual text vectorization in over 40 languages, accepting a maximum text length of 300 characters and producing vectors with 1536 dimensions.

  • The OpenSearch General Text Vectorization Service-002 (ops-text-embedding-002) provides text vectorization in over 100 languages, with a maximum input text length of 8192 characters and vectors of 1024 dimensions.

  • The OpenSearch Text Vectorization Service-Chinese-001 (ops-text-embedding-zh-001) specializes in Chinese text vectorization, handling up to 1024 characters and generating vectors with 768 dimensions.

  • The OpenSearch Text Vectorization Service-English-001 (ops-text-embedding-en-001) focuses on English text vectorization, with a maximum input of 512 characters and output vectors of 768 dimensions.

Text Sparse Vectorization

The service transforms text data into sparse vector forms, which are more storage-efficient and typically used to represent keywords and word frequency information. When combined with dense vectors, they enhance retrieval performance.

The OpenSearch Text Sparse Vectorization Service (ops-text-sparse-embedding-001) offers multilingual text vectorization in over 100 languages, with a maximum input text length of 8192 characters.

Query Analysis

The service analyzes query content, employing large language models and NLP to preprocess and recognize user input, supporting alternate query expansion to enhance retrieval and Q&A outcomes in RAG scenarios.

The Query Analysis Service 001 (ops-query-analyze-001) is a general query analysis service that uses large language models to understand user queries and suggest alternate queries.

Sorting Service

The BGE reranking model (ops-bge-reranker-larger) is a general document scoring service that ranks documents by relevance to the query and document content, providing scoring results.

Large Model

  • OpenSearch-Qwen-Turbo (ops-qwen-turbo) utilizes the qwen-turbo large-scale language model for supervised fine-tuning, enhancing retrieval and reducing harmful content.

  • Qwen-Turbo is a large-scale language model that supports multiple languages, including Chinese and English. For more information, see Qwen Large Language Model Introduction.

  • Qwen-Plus is an enhanced version of the Qwen large-scale language model, supporting languages such as Chinese and English. For more information, see Qwen Large Language Model Introduction.

  • Qwen-Max is a large-scale language model with hundreds of billions of parameters, supporting languages such as Chinese and English. For more information, see Qwen Large Language Model Introduction.

  • Qwen-MAX-LongContext is a variant of the Qwen-Max model that supports a 30k tokens context, with an API limit of 28k tokens for user input. For more information, see Qwen Large Language Model Introduction.

Steps

Document parsing

  1. Access the search development console.

  2. Choose the desired region and navigate to the search development console.

  3. From the left navigation bar, select experience center.

  4. Under service category, choose document content parsing and select the specific experience service.

  5. You may use the sample data provided by the system or upload your own data via manage data. Supported file types include Txt, Pdf, Html, Doc, Docx, Ppt, and Pptx formats, up to 20M in size.

    • Upload a local file: Files will be automatically purged after 7 days. The platform does not store your data long-term.

    • Provide a file URL and corresponding file type: Multiple URLs can be uploaded, with each URL on a separate line.

      Note

      Selecting an incorrect data format may result in failed document parsing. Please choose the appropriate file type based on your data.

      Important

      Ensure that the web link import function is used within legal and regulatory boundaries, adhering to the target platform's management specifications and safeguarding the rights of the rights holders. You are solely responsible for your parsing or downloading actions. The OpenSearch search development console, as a tool provider, assumes no responsibility for your actions.

  6. If using your own data, select the pre-uploaded file or URL from the drop-down list.

  7. Click get results, and the system will activate the service to parse the document.

    • Results: The parsing progress and results will be displayed.

    • Result Source Code: Access the result response code, and use copy code or download file to save the code locally.

    • Sample Code: View and download the sample code for using the document content parsing service.

Document slicing

  1. Access the search development console.

  2. Choose the desired region and navigate to the search development console.

  3. From the left navigation bar, select experience center.

  4. Under service category, choose document slicing and select the specific experience service.

  5. You may use the sample data provided by the system or select my data, input your own data, and choose the correct data format from Txt, Html, or MarkDown.

    Note

    Selecting an incorrect data format may result in failed document parsing. Please choose the appropriate format based on the uploaded data.

  6. Set the maximum slice length, with a default of 300 and a maximum of 1024 tokens.

  7. Click get results, and the system will activate the service to slice the document.

    • Results: The slicing progress and results will be displayed.

    • Result Source Code: Access the result response code, and use copy code or download file to save the code locally.

    • Sample Code: View and download the sample code for using the document slicing service.

Text vectorization/sparse vectorization

  1. Access the search development console.

  2. Choose the desired region and navigate to the search development console.

  3. From the left navigation bar, select experience center.

  4. Under service category, choose text vectorization and select the specific experience service.

  5. The vectorization content types support both document and query inputs.

  6. The service supports grouping or direct JSON text input.

  7. Click get results, and the system will activate the service to vectorize the text.

    • Results: The vectorization progress and outcomes will be displayed.

    • Result Source Code: Access the result response code, and use copy code or download file to save the code locally.

    • Sample Code: View and download the sample code for using the text vectorization service.

Sorting service

  1. Access the search development console.

  2. Choose the desired region and navigate to the search development console.

  3. From the left navigation bar, select experience center.

  4. Under service category, choose sorting service and select the specific experience service.

  5. You may use the sample data provided by the system or input your own data.

  6. Enter text in the query field.

  7. Click get results, and the system will activate the sorting service to organize the documents based on the relevance of the query and document content, providing scoring results.

    • Results: The sorting scores will be displayed.

    • Result Source Code: Access the result response code, and use copy code or download file to save the code locally.

    • Sample Code: View and download the sample code for using the sorting service.

Large model service

  1. Access the search development console.

  2. Choose the desired region and navigate to the search development console.

  3. From the left navigation bar, select experience center.

  4. Under service category, choose large model and select the specific experience service.

  5. Submit your question, and the large model will process the input and provide an answer.

    Important

    Please note that all content generated is produced by the artificial intelligence model. The accuracy and completeness of the generated content cannot be guaranteed and do not represent our views.

    The large model answer page displays the token count for input and output in the current Q&A session. Click delete dialog to remove the current conversation.

Image content parsing

  1. Access the search development console.

  2. Choose the desired region and navigate to the search development console.

  3. From the left navigation bar, select experience center.

  4. Under service category, choose image content parsing and select either image content understanding or image text recognition in the experience service.

  5. You may use the sample images provided by the system or upload your own images.

  6. Click get results, and the system will activate the image content parsing service to analyze and output the image content, or to recognize and output key information from the image.

    • Results: The recognition results will be displayed.

    • Result Source Code: Access the result response code, and use copy code or download file to save the code locally.

    • Sample Code: View and download the sample code for using the image content parsing service.

Query analysis

  1. Access the search development console.

  2. Choose the desired region and navigate to the search development console.

  3. From the left navigation bar, select experience center.

  4. Under service category, choose query analysis and proceed.

  5. Enter your query directly to identify its intent, or construct a multi-round conversation in the history message area and input the query. The model will analyze the conversation context and the query to determine the intent.

  6. Click get results to evaluate the model's performance.

    • Results: The recognition outcomes will be displayed.

    • Result Source Code: Access the result response code, and use copy code or download file to save the code locally.

    • Sample Code: View and download the sample code for using the query analysis service.