您可以将私有领域的文档上传至百炼的知识库,使大模型应用可以回答私有领域的问题。百炼支持通过控制台或API上传文档。本文介绍如何使用API将文档上传至百炼。
API使用前提:已开通服务并安装百炼SDK,详情参见API概览。
不支持通过API上传结构化数据,请通过控制台上传。您可以将知识库与云数据库RDS关联,以实现结构化知识库的自动更新,详情参见创建知识库。
操作步骤
通过API将非结构化文档上传至百炼,只需四步:
调用ApplyFileUploadLease接口申请文档上传租约
调用ApplyFileUploadLease接口,获取用于上传文档的 URL 链接(租约),以及上传所需的相关参数。成功调用ApplyFileUploadLease接口的响应示例如下:
此接口请求参数中的
Md5
字段指的是文档的MD5值,用于验证文档是否完整,您可以使用Java的MessageDigest类或Python的hashlib模块生成该值。此接口响应参数中的
Data.Param.Method
、Data.Param.Url
、Data.Param.Headers
中X-bailian-extra
和Content-Type
字段的值将用于下一步上传文档至百炼的临时存储。此接口响应参数中的
Data.Param.Url
字段的值(即租约)有效期为分钟级,请尽快上传文档,以免链接过期导致无法上传。{ "RequestId": "778C0B3B-59C2-5FC1-A947-36EDD1xxxxxx", "Success": true, "Message": "", "Code": "success", "Status": "200", "Data": { "FileUploadLeaseId": "1e6a159107384782be5e45ac4759b247.1719325231035", "Type": "HTTP", "Param": { "Method": "PUT", "Url": "https://bailian-datahub-data-origin-prod.oss-cn-hangzhou.aliyuncs.com/1005426495169178/10024405/68abd1dea7b6404d8f7d7b9f7fbd332d.1716698936847.pdf?Expires=1716699536&OSSAccessKeyId=TestID&Signature=HfwPUZo4pR6DatSDym0zFKVh9Wg%3D", "Headers": " \"X-bailian-extra\": \"MTAwNTQyNjQ5NTE2OTE3OA==\",\n \"Content-Type\": \"application/pdf\"" } } }
上传文档至百炼的临时存储
使用上一步返回的租约以及相关参数,将文档上传至百炼的临时存储暂存,示例代码如下。
Python
示例代码
# 示例代码仅供参考,请勿在生产环境中直接使用 import requests from urllib.parse import urlparse def upload_file(pre_signed_url, file_path): try: # 设置请求头 headers = { "X-bailian-extra": "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param.Headers中X-bailian-extra字段的值", "Content-Type": "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param.Headers中Content-Type字段的值" } # 读取文档并上传 with open(file_path, 'rb') as file: # 下方设置请求方法用于文档上传,需与您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param中Method字段的值一致 response = requests.put(pre_signed_url, data=file, headers=headers) # 检查响应状态码 if response.status_code == 200: print("File uploaded successfully.") else: print(f"Failed to upload the file. ResponseCode: {response.status_code}") except Exception as e: print(f"An error occurred: {str(e)}") def upload_file_link(pre_signed_url, source_url_string): try: # 设置请求头 headers = { "X-bailian-extra": "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param.Headers中X-bailian-extra字段的值", "Content-Type": "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param.Headers中Content-Type字段的值" } # 设置访问OSS的请求方法为GET source_response = requests.get(source_url_string) if source_response.status_code != 200: raise RuntimeError("Failed to get source file.") # 下方设置请求方法用于文档上传,需与您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param中Method字段的值一致 response = requests.put(pre_signed_url, data=source_response.content, headers=headers) # 检查响应状态码 if response.status_code == 200: print("File uploaded successfully.") else: print(f"Failed to upload the file. ResponseCode: {response.status_code}") except Exception as e: print(f"An error occurred: {str(e)}") if __name__ == "__main__": pre_signed_url_or_http_url = "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param中Url字段的值" # 文档来源可以是本地,上传本地文档至百炼临时存储 file_path = "请替换为您需要上传文档的实际本地路径" upload_file(pre_signed_url_or_http_url, file_path) # 文档来源还可以是OSS # file_path = "请替换为您需要上传文档的实际OSS可公网访问地址" # upload_file_link(pre_signed_url_or_http_url, file_path)
Java
示例代码
// 示例代码仅供参考,请勿在生产环境中直接使用 import java.io.BufferedInputStream; import java.io.DataOutputStream; import java.io.FileInputStream; import java.io.InputStream; import java.net.HttpURLConnection; import java.net.URL; public class UploadFile{ public static void uploadFile(String preSignedUrl, String filePath) { HttpURLConnection connection = null; try { // 创建URL对象 URL url = new URL(preSignedUrl); connection = (HttpURLConnection) url.openConnection(); // 设置请求方法用于文档上传,需与您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param中Method字段的值一致 connection.setRequestMethod("PUT"); // 允许向connection输出,因为这个连接是用于上传文档的 connection.setDoOutput(true); connection.setRequestProperty("X-bailian-extra", "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param.Headers中X-bailian-extra字段的值"); connection.setRequestProperty("Content-Type", "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param.Headers中Content-Type字段的值"); // 读取文档并通过连接上传 try (DataOutputStream outStream = new DataOutputStream(connection.getOutputStream()); FileInputStream fileInputStream = new FileInputStream(filePath)) { byte[] buffer = new byte[4096]; int bytesRead; while ((bytesRead = fileInputStream.read(buffer)) != -1) { outStream.write(buffer, 0, bytesRead); } outStream.flush(); } // 检查响应 int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // 文档上传成功处理 System.out.println("File uploaded successfully."); } else { // 文档上传失败处理 System.out.println("Failed to upload the file. ResponseCode: " + responseCode); } } catch (Exception e) { e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } public static void uploadFileLink(String preSignedUrl, String sourceUrlString) { HttpURLConnection connection = null; try { // 创建URL对象 URL url = new URL(preSignedUrl); connection = (HttpURLConnection) url.openConnection(); // 设置请求方法用于文档上传,需与您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param中Method字段的值一致 connection.setRequestMethod("PUT"); // 允许向connection输出,因为这个连接是用于上传文档的 connection.setDoOutput(true); connection.setRequestProperty("X-bailian-extra", "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param.Headers中X-bailian-extra字段的值"); connection.setRequestProperty("Content-Type", "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param.Headers中Content-Type字段的值"); URL sourceUrl = new URL(sourceUrlString); HttpURLConnection sourceConnection = (HttpURLConnection) sourceUrl.openConnection(); // 设置访问OSS的请求方法为GET sourceConnection.setRequestMethod("GET"); // 获取响应码,200表示请求成功 int sourceFileResponseCode = sourceConnection.getResponseCode(); // 从OSS读取文档并通过连接上传 if (sourceFileResponseCode != HttpURLConnection.HTTP_OK){ throw new RuntimeException("Failed to get source file."); } try (DataOutputStream outStream = new DataOutputStream(connection.getOutputStream()); InputStream in = new BufferedInputStream(sourceConnection.getInputStream())) { byte[] buffer = new byte[4096]; int bytesRead; while ((bytesRead = in.read(buffer)) != -1) { outStream.write(buffer, 0, bytesRead); } outStream.flush(); } // 检查响应 int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // 文档上传成功 System.out.println("File uploaded successfully."); } else { // 文档上传失败 System.out.println("Failed to upload the file. ResponseCode: " + responseCode); } } catch (Exception e) { e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } public static void main(String[] args) { String preSignedUrlOrHttpUrl = "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param中Url字段的值"; // 文档来源可以是本地,上传本地文档至百炼临时存储 String filePath = "请替换为您需要上传文档的实际本地路径"; uploadFile(preSignedUrlOrHttpUrl, filePath); // 文档来源还可以是OSS // String filePath = "请替换为您需要上传文档的实际OSS可公网访问地址"; // uploadFileLink(preSignedUrlOrHttpUrl, filePath); } }
调用AddFile接口将文档添加至百炼的数据管理
上一步操作成功后,文档将暂存于百炼的临时存储空间内 12 小时,请及时调用AddFile接口以完成最终上传(上传文档至百炼的数据管理)。
调用DescribeFile接口,轮询添加文档的解析状态
上一步AddFile接口调用成功后,百炼将开始上传并解析文档。整个过程需一定时间,请耐心等待。您可以通过访问百炼的数据管理,或调用DescribeFile接口查询文档最新状态。上传完成后,DescribeFile接口响应参数中的
Data.Status
字段的值为PARSE_SUCCESS。