Asynchronous Document Upload。
接口说明
The server loads and chunks a document based on the file extension, performs vectorization by using the embedding model that is specified when you call the CreateDocumentCollection operation, and then writes the document to the specified document collection. This operation supports multi-modal embedding for various formats of text and images.
Related operations:
- You can call the GetUploadDocumentJob operation to query the progress and result of a document upload job.
- You can call the CancelUploadDocumentJob operation to cancel a document upload job.
-
After a document upload request is submitted, the request is queued for processing. Up to 20 documents in the Pending and Running states can be processed within a Resource Access Management (RAM) user or Alibaba Cloud account.
-
A text document can be split into up to 100,000 chunks.
-
If a document collection uses the OnePeace model, each RAM user or Alibaba Cloud account can upload and query up to 10,000 images.
调试
您可以在OpenAPI Explorer中直接运行该接口,免去您计算签名的困扰。运行成功后,OpenAPI Explorer可以自动生成SDK代码示例。
授权信息
下表是API对应的授权信息,可以在RAM权限策略语句的Action
元素中使用,用来给RAM用户或RAM角色授予调用此API的权限。具体说明如下:
- 操作:是指具体的权限点。
- 访问级别:是指每个操作的访问级别,取值为写入(Write)、读取(Read)或列出(List)。
- 资源类型:是指操作中支持授权的资源类型。具体说明如下:
- 对于必选的资源类型,用背景高亮的方式表示。
- 对于不支持资源级授权的操作,用
全部资源
表示。
- 条件关键字:是指云产品自身定义的条件关键字。
- 关联操作:是指成功执行操作所需要的其他权限。操作者必须同时具备关联操作的权限,操作才能成功。
操作 | 访问级别 | 资源类型 | 条件关键字 | 关联操作 |
---|---|---|---|---|
gpdb:UploadDocumentAsync | create |
|
| 无 |
请求参数
名称 | 类型 | 必填 | 描述 | 示例值 |
---|---|---|---|---|
DBInstanceId | string | 是 | Instance ID with vector engine optimization acceleration enabled. You can call the DescribeDBInstances API to view details of all AnalyticDB PostgreSQL instances in the target region, including the instance ID. | gp-bp12ga6v69h86**** |
Collection | string | 是 | The name of the document library. 说明
Created by the CreateDocumentCollection API. You can call the ListDocumentCollections API to view the document libraries that have already been created.
| document |
Namespace | string | 否 | Namespace, defaults to public. You can create one through the CreateNamespace interface and view the list via the ListNamespaces interface. | mynamespace |
NamespacePassword | string | 是 | Password corresponding to the namespace. > This value is specified by the CreateNamespace interface. | testpassword |
RegionId | string | 是 | The region ID of the instance. | cn-hangzhou |
FileName | string | 是 | The file name of the document. 说明
| mydoc.txt |
FileUrl | string | 是 | The URL of the publicly accessible document. 说明
| https://xx/mydoc.txt |
Metadata | object | 否 | The metadata. The value of this parameter must be the same as the Metadata parameter that is specified when you call the CreateDocumentCollection operation. | |
any | 否 | 元数据信息,需和创建文档库(CreateDocumentCollection)时指定的 Metadata 字段一致。 | {"title":"mytitle","page":1} | |
ChunkSize | integer | 否 | Strategy for processing large data: the size of each chunk when the data is split into smaller parts. Maximum value is 2048. | 250 |
ChunkOverlap | integer | 否 | The size of data that is overlapped between consecutive chunks. The maximum value of this parameter cannot be greater than the value of the ChunkSize parameter. 说明
This parameter is used to prevent context missing that may occur due to data truncation. For example, when you upload a long text, you can retain specific overlapped text content between consecutive chunks to better understand the context.
| 50 |
Separators | array | 否 | The separators that are used to split large amounts of data. 说明
| |
string | 否 | The separator. | . | |
DryRun | boolean | 否 | Specifies whether to perform only document understanding and chunking, but not vectorization and storage. Default value: false. 说明
You can set this parameter to true, check the chunking effect, and then perform optimization if needed.
| false |
ZhTitleEnhance | boolean | 否 | Specifies whether to enable title enhancement. 说明
You can determine the title text, mark the text in the metadata, and then combine the text with the upper-level title to implement text enhancement.
| false |
TextSplitterName | string | 否 | The name of the splitter. Valid values:
| ChineseRecursiveTextSplitter |
DocumentLoaderName | string | 否 | The name of the document loader. You do not need to specify this parameter. A document loader is automatically specified based on the file extension. Valid values:
| PyMuPDFLoader |
返回参数
示例
正常返回示例
JSON
格式
{
"RequestId": "ABB39CC3-4488-4857-905D-2E4A051D0521",
"Message": "success",
"Status": "success",
"JobId": "231460f8-75dc-405e-a669-0c5204887e91"
}
错误码
访问错误中心查看更多错误码。