All Products
Search
Document Center

OpenSearch:Index schema

Last Updated:Aug 27, 2024

Overview

Each document contains multiple fields, and each field contains a set of words. An index is used to speed up data retrieval. Indexes can be divided into different types based on mappings. The following section describes some concepts about indexes:

  • Field: The name and the type of a field can be used to define an index table.

  • Inverted index: The index stores a mapping from words to locations in documents. Example: Word A: (Doc1,Doc2,...,DocN). An inverted index is used for data retrieval and can help you locate the document that contains the keyword you search for.

  • Forward index: The index stores a mapping from documents to fields. Example: DocID (term1,term2,...termn). Forward indexes can be divided into single-value indexes and multi-value indexes. A single-value index contains data values that belong only to a single-value attribute. The data values are fixed-length, excluding the values of the String type. This makes data queries efficient. You can also update the index data. A multi-value attribute indicates that a field contains multiple pieces of data whose quantity is not fixed. This negatively affects query performance. You cannot update the index data.

    A forward index is used to obtain the attribute of data based on document IDs. The attribute can be used for statistics, sorting, and filtering. OpenSearch Retrieval Engine Edition supports the following data types of fields in forward indexes:

    INT8 (8-bit signed integer type) and UINT8 (8-bit unsigned integer type)

    INT16 (16-bit signed integer type)

    UINT16 (16-bit unsigned integer type)

    INTEGER (32-bit signed integer type)

    UINT32 (32-bit unsigned integer type) and INT64 (64-bit signed integer type)

    UINT64 (64-bit unsigned integer type)

    FLOAT (32-bit floating-point number)

    DOUBLE (64-bit floating-point number)

    STRING (string type)

  • Summary index: The index stores data in a similar way to forward indexes. However, a summary index stores a mapping from fields to a document. A summary index can help you quickly locate the content based on the document ID. A summary index is used to display search results. A summary index contains a large amount of data. For each query, you do not need to retrieve an excessive amount of data in summary indexes. Instead, you need to only obtain the search results from a document based on the summary index. OpenSearch Retrieval Engine Edition provides a compression mechanism for summary indexes. If you enable compression for a summary index in the schema, OpenSearch Retrieval Engine Edition uses zlib to compress the summary index and then stores the compressed summary index. When OpenSearch Retrieval Engine Edition reads data from the summary index, the search engine decompresses the compressed summary index and then returns the retrieved results to the user.

Note

For more information about how to configure an index table, see Configure an index table.

Sample index schema:

{
  "file_compress": [
    {
      "name": "file_compressor",
      "type": "zstd"
    },
    {
      "name": "no_compressor",
      "type": ""
    }
  ],
  "table_name": "test",
  "summarys": {
    "summary_fields": [
      "id",
      "fb_boolean",
      "fb_datetime",
      "fb_string",
      "fb_decimal",
      "fb_bigint",
      "fb_text"
    ],
    "parameter": {
      "file_compressor": "zstd"
    }
  },
  "indexs": [
    {
      "index_name": "id",
      "index_type": "PRIMARYKEY64",
      "index_fields": "id",
      "has_primary_key_attribute": true,
      "is_primary_key_sorted": false
    },
    {
      "index_name": "fb_boolean",
      "index_type": "STRING",
      "index_fields": "fb_boolean",
      "file_compress": "file_compressor",
      "format_version_id": 1
    },
    {
      "index_name": "fb_datetime",
      "index_type": "STRING",
      "index_fields": "fb_datetime",
      "file_compress": "file_compressor",
      "format_version_id": 1
    },
    {
      "index_name": "fb_string",
      "index_type": "STRING",
      "index_fields": "fb_string"
    },
    {
      "index_name": "fb_text",
      "index_type": "TEXT",
      "index_fields": "fb_text"
    }
  ],
  "attributes": [
    {
      "field_name": "id",
      "file_compress": "no_compressor"
    },
    {
      "field_name": "fb_boolean",
      "file_compress": "file_compressor"
    },
    {
      "field_name": "fb_datetime",
      "file_compress": "no_compressor"
    },
    {
      "field_name": "fb_string",
      "file_compress": "file_compressor"
    },
    {
      "field_name": "fb_decimal",
      "file_compress": "no_compressor"
    },
    {
      "field_name": "fb_bigint",
      "file_compress": "no_compressor"
    }
  ],
  "fields": [
    {
      "user_defined_param": {},
      "field_name": "id",
      "field_type": "INT64",
      "compress_type": "equal"
    },
    {
      "field_name": "fb_boolean",
      "field_type": "STRING",
      "compress_type": "uniq"
    },
    {
      "field_name": "fb_datetime",
      "field_type": "STRING",
      "compress_type": "uniq"
    },
    {
      "user_defined_param": {
        "multi_value_sep": ","
      },
      "field_name": "fb_string",
      "field_type": "STRING",
      "compress_type": "equal",
      "multi_value": true
    },
    {
      "field_name": "fb_decimal",
      "field_type": "DOUBLE"
    },
    {
      "field_name": "fb_bigint",
      "field_type": "INT64",
      "compress_type": "equal"
    },
    {
      "field_name": "fb_text",
      "field_type": "TEXT",
      "analyzer": "chn_standard"
    }
  ]
}

Add an index table

  1. On the instance details page, choose Configuration Center > Index Schema in the left-side navigation pane. On the page that appears, click Create Index Table.

  1. Configure the Index Table, Data Source, and Data Shards parameters.

  1. Configure fields.

Separate multi-value fields with delimiters:

The default delimiter in OpenSearch Retrieval Engine Edition is ^]. You can customize a delimiter based on your business requirements.

Specify whether to compress attribute fields and field data:

  • Attribute fields: By default, attribute fields are not compressed. If file_compressor is selected for an attribute field, the attribute field is compressed.

  • Field data: By default, field data is not compressed. For multi-value fields or fields of the STRING type, uniq is selected by default. For single-value fields, equal is selected by default.

Note

If you compress attribute fields, we recommend that you modify the index loading method to reduce the impact on performance. To modify the index loading method, perform the following operations: On the instance details page of an instance, click Deployment Management in the left-side navigation pane. On the page that appears, click the Searcher worker that you want to manage. In the Searcher Worker Configurations panel, click the Online Table Configurations tab.

  1. Configure indexes.

Specify whether to compress index fields:

  • By default, index fields are not compressed. If file_compressor is selected for an index field, the index field is compressed.

Note
  • The primary key index cannot be compressed.

  • If you compress index fields, we recommend that you modify the index loading method to reduce the impact on performance. To modify the index loading method, perform the following operations: On the instance details page of an instance, click Deployment Management in the left-side navigation pane. On the page that appears, click the Searcher worker that you want to manage. In the Searcher Worker Configurations panel, click the Online Table Configurations tab.

  1. After the configuration is complete, click Save Version. In the dialog box that appears, enter the description and click Publish. The description is optional.

  1. After the index table is added, you can choose O&M Center > Deployment Management in the left-side navigation pane and view the topology on the page that appears.

  1. If you want to make the new index table take effect in the cluster, perform the following operations: In the left-side navigation pane, choose O&M Center > O&M Management. On the page that appears, click Update Configurations. In the Instance Configuration Update panel, set the Trigger Reindexing parameter to Push Configurations and Trigger Reindexing.

  1. During reindexing, you can choose O&M Center > Change History in the left-side navigation pane and click the Data Source Changes tab to view the progress of the reindexing task.

After the reindexing task is complete, you can query data from the new index table.

Important
  • You can specify only one primary key field.

  • In the field settings, you must select Search Result Display for at least one field.

  • For fields of the TEXT type, you must set an analysis method. Multi-value fields are not supported.

  • You can specify only one primary key index.

  • In addition to the default delimiter, the delimiters that are used to separate multi-value fields can only be single characters. Full-width characters are not supported.

  • If the number of replicas in the cluster is 2, set the Data Shards parameter to 2. When you purchase an instance, make sure that the number of Searcher workers is greater than the number of replicas multiplied by the number of data shards. Otherwise, the index table that you added cannot be used.

  • A single shard can contain no more than 0.6 billion pieces of data and all data shards can contain a maximum of 2.1 billion pieces in total. The index size of a single shard cannot exceed 300 GB. If data needs to be updated in real time, the total transactions per second (TPS) of the data update in a single shard cannot exceed 4,000. If you run the add command to add a document, the update TPS can reach 10,000.

Modify an index table

Index table versions:

By default, two versions are available for new index tables.

  • index_config_v1: the index table that you configure for the first time. If you have pushed the configuration and rebuilt the indexes, the status of the index table changes to In Use. If you have not pushed the configuration or rebuilt the indexes, the status of the index table changes to Unused.

  • index_config_edit: the index table that is being modified. The index table is in the Modifying state.

If the index table versions are published in a consecutive manner, the version numbers are incremental. For example, the second version is named index_config_v2 and the third version is named index_config_v3. To distinguish index table versions, you must enter the description of each version.

Modify and publish a new index table version:

  1. Find the version that is in the Modifying state and click Modify.

Note

cluster.json configuration:

OpenSearch Retrieval Engine Edition instances allow you to use index merging to configure the customized_merge_config and segment_customize_metrics_updater keys. Only new instances support the segment_customize_metrics_updater key.

  1. After modification, click Save Version.

You can also switch to developer mode to manually modify the schema.

  1. Find the version that is in the Modifying state, click Publish, and then enter the description. Then, click OK.

In this step, the system generates a new index table version for the index table. The index table version is in the Unused state.

  1. If you want to make the new index table take effect in the cluster, perform the following operations: In the left-side navigation pane, choose O&M Center > O&M Management. On the page that appears, click Update Configurations. In the Instance Configuration Update panel, set the Trigger Reindexing parameter to Push Configurations and Trigger Reindexing.

Delete an index table version:

You can delete an index table version that is in the Unused state.

View an index table version:

Click View to go to the configuration page of the index table version. You have read-only permissions on the page.

  • Administrator mode

  • Developer mode

Delete an index table

You can delete an index table that does not contain a version in the In Use state.

If an index table contains a version in the In Use state,

you can delete the index table only by performing the following steps:

  1. In the left-side navigation pane, choose O&M Center > Deployment Management. On the page that appears, click the index table that you want to delete. In the panel that appears, click Cancel Subscription on the Effective Online tab.

  2. Then, choose Configuration Center > Index Schema in the left-side navigation pane. On the page that appears, find the index table that you want to delete and then click Delete in the Actions column.

Warning

If you unsubscribe from an index table on the deployment management page, you must delete the index table from the index schema. Otherwise, the query performance of the online clusters may be affected.

Usage notes

  • When you add an index table, you must specify a data source. If no data source exists, you must add a data source before you add an index table.

  • After an index table is created, you cannot change the index table name.

  • If an index table contains a version in the In Use state, you cannot delete the index table.

  • Each index table can contain only one version in the Modifying state.