Data-related terms
Term | Description |
MaxCompute data source | The data source from which full data is obtained. The raw data is stored in MaxCompute by partition. |
API data source | The data source from which incremental data is obtained. Data is updated by calling API operations. |
document | The search unit of structured data. A document can contain one or more fields and must have a primary key field. Retrieval Engine Edition identifies a unique document based on the value of the primary key field. If a new document has the same primary key value as an existing document, the existing document is overwritten by the new document. |
field | The component of a document. A field consists of a field name and a field value. |
multi-value field | The field that contains multiple independent values. |
primary key | The field that uniquely identifies a document. |
Retrieval Engine Edition
Term | Description |
Query Result Searcher (QRS) worker | The role used in online search. QRS workers parse query requests and merge the results returned by Searcher workers. |
Searcher worker | The role used in online search. Searcher workers load index data and provide search services. |
cluster | A search service that consists of a set of QRS workers and Searcher workers. |
Processor | A role used in offline indexing for parsing users' raw data. |
Builder | A role used in offline indexing for indexing on raw data. |
Merger | A role used in offline indexing for merging and sorting indexes. |
full indexing | The process that is used for indexing on full data in a MaxCompute data source. The indexes that are generated during this process are full indexes, and the index versions are full index versions. |
incremental indexing | When data is updated in real time, the offline indexing process generates and applies the indexes to online clusters. |
real-time indexing | The data that is pushed by calling API operations takes effect in real time. This process is referred to as real-time indexing. Real-time indexes are generated in the memory of Searcher workers. |
inverted index | An inverted index is a linked list that maps terms to their locations in a set of documents. Inverted indexes are used in query clauses to make queries efficient. Example: term1->doc1,doc2,doc3;term2->doc1,doc2. |
forward index | A forward index is a linked list that maps documents to fields. Forward indexes are used in FILTER clauses. Forward indexes are less efficient than inverted indexes. Example: doc1->id,type,create_time… |
summary index | A summary index collects and stores the information that you want the system to display in summaries of search results. You can query information that is contained in a search result summary by specifying the primary key or document ID. Retrieval Engine Edition displays the search results by page. |
tokenization | The sentences in documents are tokenized to terms. If the data type of the field is TEXT, the system tokenizes the sentences into meaningful terms. For example, if the data type is TEXT, "浙江大学" is tokenized into two terms "浙江" and "大学". |
term | A term is a token or a set of tokens after tokenization. |
Data changes triggered and implemented by FSM
Change type | Whether to allow recurring events | Description |
Service discovery | Yes | Points the IP address of a Retrieval Engine Edition instance to the domain name to help you call the service. For the same cluster, all historical changes are terminated before the latest change runs. |
ha3_biz_apend | No | Adds biz. This operation can be performed once on each instance. The system automatically triggers this change. The change may continue to run until the index table is added to the instance and the index is built. |
update_biz_depend_index_fsm | No | Updates the index on which biz depends. This operation can be performed once on each instance. The system automatically triggers this change. The change may continue to run until the index table is added to the instance and the index is built. |
Online deployment | Yes | For the same cluster, all historical changes are terminated before the latest change runs. |
multi_biz_activate | No | Initializes a Retrieval Engine Edition instance. This operation can be performed once on each instance. The change may continue to run until the index table is added to the instance and the index is built. |
Index creation | Yes | For the same index, all historical changes are terminated before the latest change runs. |
Automatically triggered full indexing | Yes | The system automatically triggers this change after new data partitions are identified. The latest change and historical changes can concurrently run. |
Manually triggered full indexing | Yes | The latest change and historical changes can concurrently run. |
Configuration push | Yes | All historical changes are terminated before the latest change runs. |
Online resources | Yes | For the same zone, all historical changes are terminated before the latest change runs. |
Index rollback | Yes | The latest change and historical changes can concurrently run. |
FSM: the finite-state machine. FSM works as a mathematical model that represents a finite number of states and the switchover between these states.
Trigger recurring events: specifies whether to allow recurring events.