The example below shows the metadata of a document, containing 5 custom attributes: date
(all years in the document), reference
(all references in the document), filename
(the document name), keywords
(keywords in the document), and author
(the author information).

Enable Metadata Extraction, then click Meta Information Settings to attach uniform or personalized metadata to all documents in the knowledge base. The following image shows a metadata template as an example:

Description of the metadata template
Value:
Constant: Attaches a fixed attribute to all documents in the knowledge base.
For example, if all documents in the knowledge base share the same author, you can set author
to constant.
Variable: Attaches a variable attribute to each document in the knowledge base. Valid values:
LLM: Matches the text content of each document in the knowledge base based on the Entity Description you specify. The system automatically recognizes and extracts relevant information from the document, then attach this information to the metadata.
For example, to extract all years that appear in each document as a document attribute, set an LLM field named date
and configure the following Entity Description:

Regular: Matches the text content of each document in the knowledge base based on the regular expression you specify. Content that matches the expression is extracted and attached to the metadata.
For example, to extract all references that appear in each document, assuming the references are enclosed in double quotation marks (""), set a regular field named reference
and configure the following regular expression:

Search by Keyword: The system searches for preset keywords in each document and add the found keywords to the metadata.
For example, set the following keywords:

Used for Retrieval: When enabled, this attribute is attached to all documents and their related chunks, used for knowledge base retrieval along with the chunks.
Used for Model Reply: When enabled, this attribute is attached to all documents and their related chunks, used for response generation along with the chunks.