Creates a website import task.
URL
POST /v4/openapi/app-groups/[appGroupIdentity]/chatos/spiders
[app_group_identity] specifies the OpenSearch instance that you want to access. You can specify an instance name to access an instance that is in service.
The sample URL omits information such as the request headers and encoding method.
The sample URL also omits the endpoint that is used to connect to an OpenSearch instance.
Protocol
HTTP
HTTP request method
POST
Supported format
JSON
Request parameters
Parameter | Type | Required | Description |
url | STRING | Yes | The website URL. The URL must be unique within an OpenSearch instance. |
category | STRING | Yes | The category of the data that is to be imported from the website. The value of this parameter is consistent with the value of the category field in the main table. The category must be unique within an OpenSearch instance. |
urlRegex | List<STRING> | No | A regular expression that is used as a URL filter condition to filter web page URLs. Multiple filter conditions are supported. The default URL filter condition is a URL that starts with the URL of the website that you want to access. For example, if the URL of the website is http://www.abc.com/, the default regular expression is http://www\.abc\.com/.*. |
xpathSelectors | List<STRING> | No | An XPath selector that is used to query the specified content on web pages. Multiple XPath selectors are supported. For example, if you want to query content in the div tag on web pages, set this parameter to //div. |
cssSelectors | List<STRING> | No | A CSS selector that is used to query the specified content on web pages. Multiple CSS selectors are supported. For example, if you want to query content in the <div class="content">Web Page Content</div> format on web pages, set this parameter to div.content. |
Sample request
{
"category": "OpenSearch documentation"
"url": "http://xxx"
}
Response parameters
Parameter | Type | Description |
errors | LIST | The error details. |
status | STRING | The execution result of the request. Valid values: OK and FAIL. A value of OK indicates that the request is successful. A value of FAIL indicates that the request fails. In this case, troubleshoot errors based on the error code. |
request_id | STRING | The ID of the request. |
code | STRING | The error code. |
message | STRING | The error message. |
latency | STRING | The latency of the request. |
Sample response
{
"status" : "OK",
"requestId" : "",
"httpCode": 200,
"code": "",
"message": "",
"latency" : 123
}
Usage notes
The website import task crawls the content from the website of the specified URL. By default, the web pages whose URLs start with the specified URL are included.
If the website URL is valid but the robots.txt file of the website does not support the crawling feature, an error is returned.
Only one website import task that is running can exist in an OpenSearch instance.