Creates an unstructured knowledge base and imports one or more parsed documents into the knowledge base. You cannot create a structured knowledge base by calling an API operation. Use the console instead.
Operation description
- You must first upload documents to Data Management and obtain the
FileId
. The documents are the knowledge source of the knowledge base. For more information, see Import Data. - This operation only initializes a knowledge base creation job. You must also call the SubmitIndexJob operation to complete the job.
- This interface is not idempotent.
Debugging
Authorization information
The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action
policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:
- Operation: the value that you can use in the Action element to specify the operation on a resource.
- Access level: the access level of each operation. The levels are read, write, and list.
- Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
- The required resource types are displayed in bold characters.
- If the permissions cannot be granted at the resource level,
All Resources
is used in the Resource type column of the operation.
- Condition Key: the condition key that is defined by the cloud service.
- Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.
Operation | Access level | Resource type | Condition key | Associated operation |
---|---|---|---|---|
sfm:CreateIndex | create | *All Resources * |
| none |
Request syntax
POST /{WorkspaceId}/index/create HTTP/1.1
Request parameters
Parameter | Type | Required | Description | Example |
---|---|---|---|---|
WorkspaceId | string | Yes | The ID of the workspace to which the knowledge base belongs. To view the workspace ID, you can click the Workspace Details icon in the upper-left corner on the homepage of the console. | ws_3Nt27MYcoK191ISp |
Name | string | Yes | The name of the knowledge base. The name must be 1 to 20 characters in length and can contain characters classified as letter in Unicode, including English letters, Chinese characters, digits, among others. The name can also contain colons (:), underscores (_), periods (.), and hyphens (-). | |
StructureType | string | Yes | The data type of the knowledge base. For more information, see Create a knowledge base. Valid value:
Note
After a knowledge base is created, its data type cannot be changed. You cannot create a structured knowledge base by calling an API operation. Use the console instead.
| structured |
EmbeddingModelName | string | No | The name of the embedding model. The embedding model converts the original input prompt and knowledge text into numerical vectors for similarity comparison. The default and only model available is DashScope text-embedding-v2. It supports multiple languages including Chinese and English and normalizes the vector results. For more information, see Create a knowledge base. Valid value:
The default value is null, which means using the text-embedding-v2 model. | text-embedding-v2 |
RerankModelName | string | No | The name of the rank model. The rank model is a scoring system outside the knowledge base. It calculates the similarity score of each text chunk in the input question and knowledge base and ranks them in descending order. Then, the model returns the top K chunks with the highest scores. For more information, see Create a knowledge base. Valid values:
The default value is empty, which means using the official gte-rerank-hybrid model. Note
If you need only semantic ranking, we recommend that you use gte-rerank. If you need both semantic ranking and text matching features to ensure relevance, we recommend that you use gte-rerank-hybrid.
| gte-rerank-hybrid |
RerankMinScore | double | No | Similarity Threshold. The lowest similarity score of chunks that can be returned. This parameter is used to filter text chunks returned by the rank model. For more information, see Create a knowledge base. Valid values: [0.01-1.00]. Default value: 0.20. | 0.20 |
ChunkSize | integer | No | The estimated length of chunks. The maximum number of characters for a chunk. Texts exceeding this limit are splited. For more information, see Create a knowledge base. Valid values: [1-2048]. The default value is empty, which means using the intelligent splitting method. Note
If you specify the ChunkSize parameter, you must also specify the OverlapSize and Separator parameters. If you do not specify these three parameters, the system uses the intelligent splitting method by default.
| 128 |
OverlapSize | integer | No | The overlap length. The number of overlapping characters between two consecutive chunks. For more information, see Create a knowledge base. Valid values: 0 to 1024. The default value is empty, which means using the intelligent splitting method. | 16 |
Separator | string | No | The clause identifier. The document is split into chunks based on this identifier. For more information, see Create a knowledge base. You can specify multiple identifiers and do not need to add any other characters to separate them. For example: !,\\n. Valid values:
The default value is empty, which means using the intelligent splitting method. | , |
SourceType | string | Yes | The data type of Data Management. For more information, see Create a knowledge base. Valid values:
Note
If this parameter is set to DATA_CENTER_CATEGORY, you must specify the CategoryIds parameter. If this parameter is set to DATA_CENTER_FILE, you must specify the DocumentIds parameter.
Note
If you want to create an empty knowledge base, you can use an empty category. Set this parameter to DATA_CENTER_CATEGORY. And specify the ID of an empty category for the CategoryIds parameter.
| DATA_CENTER_FILE |
DocumentIds | array | No | The list of primary key IDs of the documents to be imported into the knowledge base. | |
string | No | The primary key ID of the document. To view the ID, you can click the ID icon next to the file name on the Data Management page. | file_9a65732555b54d5ea10796ca5742ba22_XXXXXXXX | |
CategoryIds | array | No | The list of primary key IDs of the categories to be imported into the knowledge base. | |
string | No | The primary key ID of the category. To view the ID, you can click the icon next to the ID category on the Data Management page. All documents in specified categories are imported into the knowledge base. | ca_hiu2383nf934j | |
DataSource | object | No | Note
This parameter is not available. Do not specify this parameter.
| |
CredentialId | string | No | Note
This parameter is not available. Do not specify this parameter.
| |
CredentialKey | string | No | Note
This parameter is not available. Do not specify this parameter.
| |
Database | string | No | Note
This parameter is not available. Do not specify this parameter.
| |
Endpoint | string | No | Note
This parameter is not available. Do not specify this parameter.
| |
IsPrivateLink | boolean | No | Note
This parameter is not available. Do not specify this parameter.
| |
Region | string | No | Note
This parameter is not available. Do not specify this parameter.
| |
SubPath | string | No | Note
This parameter is not available. Do not specify this parameter.
| |
SubType | string | No | Note
This parameter is not available. Do not specify this parameter.
| |
Table | string | No | Note
This parameter is not available. Do not specify this parameter.
| |
Type | string | No | Note
This parameter is not available. Do not specify this parameter.
| |
SinkType | string | Yes | The vector storage type of the knowledge base. For more information, see Create a knowledge base. Valid values:
Note
If you have not used AnalyticDB for AnalyticDB in Model Studio before, go to the Create Knowledge Base page, select ADB-PG as Vector Storage Type, and follow the instructions to grant permissions. If you specify ADB, you must also specify the SinkInstanceId and SinkRegion parameters.
| DEFAULT |
SinkInstanceId | string | No | The ID of the vector storage instance. This parameter is available only when SinkType is set to ADB. You can view the ID on the Instances page of AnalyticDB for PostgreSQL. | gp-bp321093j84 |
SinkRegion | string | No | The region of the vector storage instance. This parameter is available only when SinkType is set to ADB. You can call the DescribeRegions operation to query the most recent region list. | cn-hangzhou |
Columns | array<object> | No | Note
This parameter is not available. Do not specify this parameter.
| |
object | No | |||
Column | string | No | Note
This parameter is not available. Do not specify this parameter.
| source_column_name1 |
IsRecall | boolean | No | Note
This parameter is not available. Do not specify this parameter.
| true |
IsSearch | boolean | No | Note
This parameter is not available. Do not specify this parameter.
| true |
Name | string | No | Note
This parameter is not available. Do not specify this parameter.
| index_column_name1 |
Type | string | No | Note
This parameter is not available. Do not specify this parameter.
| string |
Description | string | No | The description of the knowledge base. The description must be 0 to 1,000 characters in length. This parameter is empty by default. |
Response parameters
Examples
Sample success responses
JSON
format
{
"Code": "Forbidden",
"Data": {
"Id": "jkurxhju6b"
},
"Message": "Invalid input, variable name is missing",
"RequestId": "17204B98-7734-4F9A-8464-2446A84821CA",
"Status": "200",
"Success": "true\n"
}
Error codes
For a list of error codes, visit the Service error codes.
Change history
Change time | Summary of changes | Operation |
---|