All Products
Search
Document Center

OpenSearch:Confogure a table loading policy

Last Updated:Jun 03, 2025

Overview

An index table loading policy consists of the loading policies of multiple index files. An index table loading policy describes how to load a set of index files. When loading an index table, the system loads each index file of the index table based on the policy that matches first for the index file among all policies.

Sample configurations

{
    "load_config":[
        {
            "file_patterns":[
                "_ATTRIBUTE_",
                "/index/title/.*",
                "/index/body/dictionary"
            ],
            "load_strategy":"mmap",
            "lifecycle":"hot",
            "load_strategy_param":{
                "lock":true,
                "partial_lock":true,
                "advise_random":false,
                "slice":4194304,
                "interval":2
            },
            "remote" : false,
            "deploy" : true,
            "warmup_strategy":"sequential"
        },
        {
            "file_patterns":[
                "_SUMMARY_"
            ],
            "load_strategy":"cache",
            "load_strategy_param":{
                "global_cache":false,
                "direct_io":true,
                "cache_size":4096
            },
            "remote" : true,
            "deploy" : false
        },
        {
            "warmup_strategy":"none",
            "file_patterns":[
                ".*"
            ],
            "load_strategy":"mmap",
            "load_strategy_param":{
                "lock":false
            }
        }
    ]
}

Parameters

  • file_patterns: The pattern that is used to match a loading policy for an index file. Specify this parameter by using regular expressions. For more information about the directory structure of index tables, see Index file directory structure. A matching pattern is specified as a regular expression for file names relative to the segment directory. For example, to configure an independent loading policy for the inverted index "title", the matching pattern regular expression is "/index/title/.*", among which the title index directory exists in the index directory because the index name is "title", and all files within the title index directory need to be matched using .*. The system provides the following built-in macros to simplify the matching pattern configurations of an index file:

    • _ATTRIBUTE_: equivalent to "/attribute/.*", which indicates all forward indexes.

    • _INDEX_: equivalent to "/index/.*", which indicates all inverted indexes.

    • _SUMMARY_: equivalent to "/summary/", which indicates all summary indexes.

  • load_strategy: The loading policy. Valid values: mmap and cache.

  • load_strategy_param: The parameters used to configure the loading policy, including:

    • Mmap loading policy parameters

      • lock: Specifies whether to enable the lock mode for the mmap loading policy. Default value: false. After the lock mode is enabled, indexes are loaded into memory without being swapped out. This ensures query performance but leads to more memory overheads.

      • partial_lock: Specifies whether to enable the partial lock mode for inverted indexes. Default value: false. After the partial lock mode is enabled, the first-level dictionary of inverted indexes is loaded into memory but not the second-level dictionary. This saves memory.

      • advise_random: Specifies to reduces the number of read-ahead requests to the disk. Default value: false. In scenarios where indexes are oversized and some ones cannot be loaded into memory, disk I/O may be a performance bottleneck for queries. If you set this parameter to true, the number of read-ahead requests to the disk can be significantly reduced, and query performance can be improved.

      • slice and interval: These parameters specify the speed at which indexes are prefetched and loaded. The system reads data of the size specified by the slice parameter at a time at an interval specified by the interval parameter. Unit of the slice parameter: bytes. Unit of the interval parameter: milliseconds. The slice and interval parameters must be used in combination. The default slice is 4194304 (equivalent to 4 MB). The default interval is 0 milliseconds, which indicates that the throttling is disabled.

    • Cache loading policy parameters

      • direct_io: Specifies whether to read files in Direct I/O mode. Default value: false. If data is read from SSDs in Direct I/O mode, query performance is improved.

      • global_cache: Specifies whether to enable global block cache. Default value: false. The value of this parameter is set by using environment variables. This parameter is unavailable. We recommend that you set this parameter to false.

      • cache_size: The size of the block cache. This parameter takes effect only if global_cache is set to false. Default value: 1. Unit: MB.

      • block_size: The block size. Default value: 4096. Unit: B.

  • remote: Specifies whether to read index files matching the value of file_patterns from the remote distributed storage system. Valid values: true and false. This parameter takes effect only if need_read_remote_index is set to true. If need_read_remote_index is set to false, this parameter is fixed to false.

  • deploy: Specifies whether to distribute index files matching the value of file_patterns to local disks. Valid values: true and false. This parameter takes effect only if need_deploy_index is set to true. If need_deploy_index is set to false, this parameter is fixed to false.

  • warmup_strategy: The preheating policy. This parameter takes effect only for the mmap loading policy. Default value: none, which indicates that the system does not prefetch data. To preheat data, set this parameter to sequential, which indicates that the system prefetches data in sequence.

Examples

Sample mmap loading policy

{
    "load_config":[
        {
            "file_patterns":[
                "/attribute/price/.*",  # The attribute field named price.
                "/index/title/.*", # The inverted index named title.
                "/index/body/dictionary", # The dictionary of the inverted index named body.
                "/index/vector/aitheta.*" # The vector index named vector
            ],
            "load_strategy":"mmap",
            "load_strategy_param":{
                "lock":true,  # The lock mode is enabled for the mmap loading policy.
                "partial_lock":true, # The partial lock mode is enabled. Only the first-level dictionary of inverted indexes is locked.
                "slice":4194304, # During the prefetching, the system reads 4 MB of data at a time at an interval of 2 ms.
                "interval":2
            },
            "remote" : false, # The system does not read the index files that match the value of the file_patterns parameter from the remote distributed storage system.
            "deploy" : true, # The system distributes the indexes to local disks.
            "warmup_strategy":"sequential" # The system prefetches data in sequence.
        },
        {
            "file_patterns":[
                "/attribute/tags", # The attribute field named tags.
                "/index/inverted index description/.*"  # The inverted index named description.
            ],
            "load_strategy":"mmap",
            "load_strategy_param":{
                "lock":false,
            },
            "remote" : false,
            "deploy" : true,
            "warmup_strategy":"none"
        }
    ]
}

Sample cache loading policy

{
    "load_config":[
        {
            "file_patterns":[
                "_ATTRIBUTE_" # All attribute fields.
            ],
            "load_strategy":"cache",
            "load_strategy_param":{
                "global_cache":false, # Global block cache is disabled.
                "direct_io":true, # The system reads files in Direct I/O mode.
                "cache_size":20480 # The cache size is 20 GB.
            },
            "remote" : false, # The system does not read the index files that match the value of the file_patterns parameter from the remote distributed storage system.
            "deploy" : true # The system distributes the indexes to local disks.
        },
        {
            "file_patterns":[
                "/summary/data"  # The data files of a summary index.
            ],
            "load_strategy":"cache",
            "load_strategy_param":{
                "global_cache":false,
                "direct_io":true,
                "cache_size":4096
            },
            "remote" : false,
            "deploy" : true
        },
        {
            "warmup_strategy":"none",
            "file_patterns":[
                ".*"
            ],
            "load_strategy":"mmap",
            "load_strategy_param":{
                "lock":false
            }
        }
    ]
}

Sample loading policy for storage-computing separation

# To enable storage-computing separation, set the need_read_remote_index parameter to true.
{
    "load_config":[
        {
            "file_patterns":[
                "/index/title/.*" # The inverted index named title.
            ],
            "load_strategy":"mmap",
            "load_strategy_param":{
                "lock":true,  # The lock mode is enabled for the mmap loading policy.
                "partial_lock":true, # The partial lock mode is enabled. Only the first-level dictionary of inverted indexes is locked.
                "slice":4194304, # During the prefetching, the system reads 4 MB of data at a time at an interval of 2 ms.
                "interval":2
            },
            "remote" : false, # The system does not read the index files that match the value of the file_patterns parameter from the remote distributed storage system.
            "deploy" : true, # The system distributes the indexes to local disks.
            "warmup_strategy":"sequential" # The system prefetches data in sequence.
        },
        {
            "file_patterns":[
                "_ATTRIBUTE_" # All attribute fields.
            ],
            "load_strategy":"cache",
            "load_strategy_param":{
                "global_cache":false, # Global block cache is disabled.
                "direct_io":true, # The system reads files in Direct I/O mode.
                "cache_size":20480 # The cache size is 20 GB.
            },
            "remote" : true, # The system reads the index files that match the value of the file_patterns parameter from the remote distributed storage system.
            "deploy" : false # The system does not distribute the indexes to local disks.
        },
        {
            "file_patterns":[
                "/summary/data"  # The data files of a summary index.
            ],
            "load_strategy":"cache",
            "load_strategy_param":{
                "global_cache":false,
                "direct_io":true,
                "cache_size":4096
            },
            "remote" : true, # The system reads the index files that match the value of the file_patterns parameter from the remote distributed storage system.
            "deploy" : false # The system does not distribute the indexes to local disks.
        },
        {
            "warmup_strategy":"none",
            "file_patterns":[
                ".*"
            ],
            "load_strategy":"mmap",
            "load_strategy_param":{
                "lock":false
            }
        }
    ]
}

Index file directory structure

  |-- generation_0
      |-- partition_0_65535
          |-- index_format_version
          |-- index_partition_meta
          |-- schema.json
          |-- segment_0
              |-- attribute
                  `--attribute_name
                     `--data   
              |-- deletionmap
              |-- deploy_index
              |-- index
                 `--index_name
                    |-- bitmap_dictionary
                    |-- bitmap_posting
                    |-- dictionary
                    `-- posting
                  `--vector_index_name
                    |-- aitheta.index
                  	|-- aitheta.index.addr
              |-- summary
                	|-- data
                	|-- offset
              `-- segment_info
          |-- adaptive_bitmap__meta
              |--deploy_index
              |--dictionary_name
          |-- truncate_meta
              |-- deploy_index
              `-- truncate_meta_file 
          `-- version.0

Field

Description

generation

The identifier that is used by OpenSearch Vector Search Edition to distinguish the versions of full indexes.

partition

The basic unit for a Searcher worker to load indexes. If a partition contains an excessive amount of data, the performance of a Searcher worker decreases. You can split online data into multiple partitions to ensure the retrieval efficiency of each Searcher worker.

segment

The basic unit of an index. A segment stores data for inverted and forward indexes.. The builder generates a segment for each index dump. Multiple segments can be merged based on a merge policy. The available segments in a partition are specified in the version file.

index

The basic unit of an inverted index.

attribute

The basic unit of a forward index.

deletionmap

The information of deleted documents.

index_format_version

The index version information. The index version is used to check whether the index file meets binary requirements.

index_partition_meta

The global sorting information of an index, including sorting fields and sorting orders, such as the ascending and descending orders.

schema.json

The index configuration file. The file contains information about fields, indexes, attributes, and summaries. OpenSearch Retrieval Engine Edition uses this file to load indexes.

version.0

The version file. The file contains information about the segments that OpenSearch Retrieval Engine Edition needs to load in the current partition and the timestamp of the most recent document in the partition. When OpenSearch Retrieval Engine Edition builds indexes for real-time data, the system filters out the outdated original documents based on the timestamps in the incremental index.

segment_info

The segment information summary, including the number of documents in the current segment, whether the current segment has been merged, the locator information, and the timestamp of the most recent document.

dictionary

The dictionary of an inverted index.

posting

The posting lists of an inverted index.

bitmap_dictionary

The dictionary of high-frequency words if you create a bitmap index for high-frequency words.

bitmap_posting

The posting lists of high-frequency words if you create a bitmap index for high-frequency words.

aitheta.index

The vector index file.

aitheta.index.addr

The metadata of vector indexes.