CreateSpider - OpenSearch - Alibaba Cloud Documentation Center

Creates a website import task.

URL

POST /v4/openapi/app-groups/[appGroupIdentity]/chatos/spiders

[app_group_identity] specifies the OpenSearch instance that you want to access. You can specify an instance name to access an instance that is in service.
The sample URL omits information such as the request headers and encoding method.
The sample URL also omits the endpoint that is used to connect to an OpenSearch instance.

HTTP

POST

JSON

Parameter	Type	Required	Description
url	STRING	Yes	The website URL. The URL must be unique within an OpenSearch instance.
category	STRING	Yes	The category of the data that is to be imported from the website. The value of this parameter is consistent with the value of the category field in the main table. The category must be unique within an OpenSearch instance.
urlRegex	List<STRING>	No	A regular expression that is used as a URL filter condition to filter web page URLs. Multiple filter conditions are supported. The default URL filter condition is a URL that starts with the URL of the website that you want to access. For example, if the URL of the website is http://www.abc.com/, the default regular expression is http://www\.abc\.com/.*.
xpathSelectors	List<STRING>	No	An XPath selector that is used to query the specified content on web pages. Multiple XPath selectors are supported. For example, if you want to query content in the div tag on web pages, set this parameter to //div.
cssSelectors	List<STRING>	No	A CSS selector that is used to query the specified content on web pages. Multiple CSS selectors are supported. For example, if you want to query content in the <div class="content">Web Page Content</div> format on web pages, set this parameter to div.content.

Sample request

{ 
 "category": "OpenSearch documentation"
 "url": "http://xxx"
}

Parameter	Type	Description
errors	LIST	The error details.
status	STRING	The execution result of the request. Valid values: OK and FAIL. A value of OK indicates that the request is successful. A value of FAIL indicates that the request fails. In this case, troubleshoot errors based on the error code.
request_id	STRING	The ID of the request.
code	STRING	The error code.
message	STRING	The error message.
latency	STRING	The latency of the request.

Sample response

{
 "status" : "OK",
 "requestId" : "",
 "httpCode": 200,
 "code": "",
 "message": "",
 "latency" : 123
 
}

The website import task crawls the content from the website of the specified URL. By default, the web pages whose URLs start with the specified URL are included.
If the website URL is valid but the robots.txt file of the website does not support the crawling feature, an error is returned.
Only one website import task that is running can exist in an OpenSearch instance.