Instance management
Term | Description |
instance | An instance is a set of data configurations, such as data source schema, index schema, and data attributes. An instance serves as a search service. |
document | A document is a search unit of structured data. A document can contain one or more fields and must have a primary key field. OpenSearch identifies a unique document based on the value of the primary key field. If a new document has the same primary key value as an existing document, the exiting document is overwritten by the new one. |
field | A field is a component of a document. A field consists of a field name and a field value. |
plug-ins | To help you process data during data import, OpenSearch provides various built-in data processing plug-ins. You can choose to use these plug-ins when you define the schema or configure a data source for an application. |
source data | The original data to be pushed to OpenSearch. It contains one or more source fields. |
source field | A source field is the smallest unit of the source data. A source field consists of a field name and a field value. For more information about supported data types, see Application schema and index schema. |
index | An index is a data structure that is used to accelerate retrieval. You can create multiple indexes for one instance. |
composite index | You can create a composite index on multiple fields of the text types such as TEXT or SHORT_TEXT. For example, if you need to create a forum search service that supports both title-based searches and comprehensive searches based on titles and bodies, you can create the title_search index on titles and the default composite index on both titles and bodies. This way, title-based searches are implemented based on the title_search index. Comprehensive searches based on titles and bodies are implemented based on the default composite index. |
index field | Index fields can be used in query clauses. To implement high-performance data retrieval, you must define index fields. |
attribute field | Attribute fields can be used in the FILTER clauses, SORT clauses, AGGREGATE clauses, and DISTINCT clauses of queries to implement features such as filtering and statistics. |
default display field | Default display fields are displayed in search results. You can use fetch_fields, which is an API parameter, to specify the fields to return for each search request. Note that if you set the fetch_fields parameter in your program, the configurations of the default display fields are ignored and the fields that are specified by the fetch_fields parameter are displayed in the search results. If you do not set the fetch_fields parameter in your program, the default display fields are displayed in the search results. |
tokenization | This feature is used to tokenize the sentences in documents to terms. If the data type of the field is TEXT, the system tokenizes the sentences into meaningful terms. If the data type of the field is SHORT_TEXT, the system tokenizes the sentences into single Chinese characters. For example, if the data type is TEXT, "浙江大学" is tokenized into two terms "浙江" and "大学". If the data type is SHORT_TEXT, "浙江大学" is converted to four single Chinese characters "浙", "江", "大", and "学". |
term | A term is a token or a set of tokens after tokenization. |
index building | After tokenization, indexes are built based on terms. This allows OpenSearch to locate specific documents based on search requests in a fast manner. Search engines can build two types of linked lists: inverted indexes and forward indexes. |
inverted index | An inverted index is a linked list that maps terms to their locations in a set of documents. Inverted indexes are used in query clauses. Example: term1->doc1,doc2,doc3 and term2->doc1,doc2. |
forward index | A forward index is a linked list that maps documents to fields. Forward indexes are used in FILTER clauses. Forward indexes are less efficient than inverted indexes. Example: doc1->id,type,create_time. |
retrieval | After documents are pushed to OpenSearch, the field values in the documents are converted to individual terms based on query keywords. OpenSearch looks up inverted indexes that are built based on the terms to find matched documents. |
retrieval amount | The number of documents that are retrieved. |
Data synchronization
Term | Description |
data source | The source of data to be pushed. OpenSearch currently supports data synchronization from ApsaraDB for RDS, MaxCompute, and PolarDB. |
reindexing | This feature reindexes on data. Indexing is required after you configure or modify the application schema and a data source. |
Quota management
Term | Description |
document capacity | The cumulative size of total documents of tables in an instance. The cumulative size is calculated based on the field values. Each field value is converted to a string to calculate the cumulative size. |
QPS | The number of queries per second. |
LCU | Logical computing unit (LCU) is the unit that is used to measure the computing power of a search service. A LCU indicates the computing power of 10 millicores in a search cluster. Millicore is the unit of CPU resources. Each millicore is one thousandth of one core. |
scaling | You can quickly upgrade or downgrade the configurations of instances based on your business requirements. Small specifications take effect immediately. The change in instance types, such as conversion from a shared instance to an exclusive instance, takes effect only after the change is approved. |
Search
Term | Description |
sort expression | A sort expression is an expression that you can write to control the sorting of search results. You can use basic mathematical operations, mathematical functions, and built-in functions to write a sort expression. |
rough sort expression | The search results are first sorted by using a rough sort expression. The system calculates the matching scores of the documents based on a rough sort expression and sorts the documents based on the calculated scores. |
fine sort expression | The system selects top N results that are sorted based on a rough sort and calculates the matching scores of the results in a more precise manner by using a fine sort expression. Then, the system sorts the results based on the calculated scores. |
search result summary | Generally, the length of text content is long. To help users understand the main content of a document, only a part of content of a document is displayed in the search results. |
query analysis | Currently, features such as synonyms, spelling correction, stop words, and term weight are supported. The system can identify the search intent. |