All Products
Search
Document Center

Intelligent Media Management:ExtractDocumentText

Last Updated:Dec 11, 2024

Extracts the text from the document body.

Operation description

  • Before you call this operation, make sure that you are familiar with the billing of Intelligent Media Management (IMM).
  • Make sure that the specified project exists in the current region. For more information, see Project management.
  • The following document formats are supported: Word, Excel, PPT, PDF, and TXT.
  • The document cannot exceed 200 MB in size. The size of the extracted text cannot exceed 2 MB in size (approximately 1.2 million letters).
Note If the format of the document is complex or the document body is too large, a timeout error may occur. In this case, we recommend that you call the CreateOfficeConversionTask operation to convert the document to the TXT format before you call the ExtractDocumentText operation.

Debugging

You can run this interface directly in OpenAPI Explorer, saving you the trouble of calculating signatures. After running successfully, OpenAPI Explorer can automatically generate SDK code samples.

Authorization information

The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:

  • Operation: the value that you can use in the Action element to specify the operation on a resource.
  • Access level: the access level of each operation. The levels are read, write, and list.
  • Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
    • The required resource types are displayed in bold characters.
    • If the permissions cannot be granted at the resource level, All Resources is used in the Resource type column of the operation.
  • Condition Key: the condition key that is defined by the cloud service.
  • Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.
OperationAccess levelResource typeCondition keyAssociated operation
imm:ExtractDocumentTextnone
*Project
acs:imm:{#regionId}:{#accountId}:project/{#ProjectName}
    none
none

Request parameters

ParameterTypeRequiredDescriptionExample
ProjectNamestringYes

The name of the project. You can obtain the name of the project from the response of the CreateProject operation.

immtest
SourceURIstringYes

The URI of the Object Storage Service (OSS) bucket in which the document is stored.

Specify the value in the oss://${Bucket}/${Object} format. ${Bucket} specifies the name of the OSS bucket that resides in the same region as the current project. ${Object} specifies the complete path to the file that has an extension.

oss://test-bucket/test-object
SourceTypestringNo

The type of the filename extension of the source data. By default, the filename extension of the source data is the same as the filename extension of the input document. If the input document has no extension, you can specify this parameter. Valid values:

  • Text documents: doc, docx, wps, wpss, docm, dotm, dot, dotx, and html
  • Presentation documents: pptx, ppt, pot, potx, pps, ppsx, dps, dpt, pptm, potm, ppsm, and dpss
  • Table documents: xls, xlt, et, ett, xlsx, xltx, csv, xlsb, xlsm, xltm, and ets
  • PDF documents: pdf.
docx
CredentialConfigCredentialConfigNo

If you do not have special requirements, leave this parameter empty.

The authorization chain. This parameter is optional. For more information, see Use authorization chains to access resources of other entities.

Response parameters

ParameterTypeDescriptionExample
The current API has no return parameters

Examples

Sample success responses

JSONformat

{
  "RequestId": "94D6F994-E298-037E-8E8B-0090F27*****",
  "DocumentText": ""
}

Error codes

For a list of error codes, visit the Service error codes.

Change history

Change timeSummary of changesOperation
2023-12-13The request parameters of the API has changedView Change Details