Executes SQL statements to perform operations on an object and obtains the execution results.
Usage notes
You must be granted read permissions on the object.
If an SQL statement is executed correctly, HTTP status code 206 is returned. If an SQL statement is invalid or cannot match existing objects, HTTP status code 400 is returned.
You are charged based on the size of the objects that are scanned when you call the SelectObject operation. For more information, see Data processing fees.
Request syntax
This operation supports the following request syntax for CSV and JSON objects.
Request syntax for CSV objects
POST /object?x-oss-process=csv/select HTTP/1.1 HOST: BucketName.oss-cn-hangzhou.aliyuncs.com Date: time GMT Content-Length: ContentLength Content-MD5: MD5Value Authorization: Signature <?xml version="1.0" encoding="UTF-8"?> <SelectRequest> <Expression>Base64-encoded SQL statement. Example: c2VsZWN0IGNvdW50KCopIGZyb20gb3Nzb2JqZWN0IHdoZXJlIF80ID4gNDU=</Expression> <InputSerialization> <CompressionType>None|GZIP</CompressionType> <CSV> <FileHeaderInfo> NONE|IGNORE|USE </FileHeaderInfo> <RecordDelimiter>Base64-encoded character</RecordDelimiter> <FieldDelimiter>Base64-encoded character</FieldDelimiter> <QuoteCharacter>Base64-encoded character</QuoteCharacter> <CommentCharacter>Base64-encoded character</CommentCharacter> <Range>line-range=start-end|split-range=start-end</Range> <AllowQuotedRecordDelimiter>true|false</AllowQuotedRecordDelimiter> </CSV> </InputSerialization> <OutputSerialization> <CSV> <RecordDelimiter>Base64-encoded character</RecordDelimiter> <FieldDelimiter>Base64-encoded character</FieldDelimiter> </CSV> <KeepAllColumns>false|true</KeepAllColumns> <OutputRawData>false|true</OutputRawData> <EnablePayloadCrc>true</EnablePayloadCrc> <OutputHeader>false</OutputHeader> </OutputSerialization> <Options> <SkipPartialDataRecord>false</SkipPartialDataRecord> <MaxSkippedRecordsAllowed> max allowed number of records skipped </MaxSkippedRecordsAllowed> </Options> </SelectRequest>
Request syntax for JSON objects
POST /object?x-oss-process=json/select HTTP/1.1 HOST: BucketName.oss-cn-hangzhou.aliyuncs.com Date: time GMT Content-Length: ContentLength Content-MD5: MD5Value Authorization: Signature <?xml version="1.0" encoding="UTF-8"?> <SelectRequest> <Expression> Base64-encoded SQL statement. Example: c2VsZWN0IGNvdW50KCopIGZyb20gb3Nzb2JqZWN0IHdoZXJlIF80ID4gNDU= </Expression> <InputSerialization> <CompressionType>None|GZIP</CompressionType> <JSON> <Type>DOCUMENT|LINES</Type> <Range> line-range=start-end|split-range=start-end </Range> <ParseJsonNumberAsString> true|false </ParseJsonNumberAsString> </JSON> </InputSerialization> <OutputSerialization> <JSON> <RecordDelimiter> Base64 of record delimiter </RecordDelimiter> </JSON> <OutputRawData>false|true</OutputRawData> <EnablePayloadCrc>true</EnablePayloadCrc> </OutputSerialization> <Options> <SkipPartialDataRecord> false|true </SkipPartialDataRecord> <MaxSkippedRecordsAllowed> max allowed number of records skipped </MaxSkippedRecordsAllowed> </Options> </SelectRequest>
Request elements
Element | Type | Description |
SelectRequest | Container | The container that stores the SelectObject request. Child nodes: Expression, InputSerialization, and OutputSerialization Parent nodes: none |
Expression | String | The Base64-encoded SQL statement. Child nodes: none Parent nodes: SelectRequest |
InputSerialization | Container | Optional. This element specifies the input serialization parameters. Child nodes: CompressionType, CSV, and JSON Parent nodes: SelectRequest |
OutputSerialization | Container | Optional. This element specifies the output serialization parameters. Child nodes: CSV, JSON, and OutputRawData Parent nodes: SelectRequest |
CSV(InputSerialization) | Container | Optional. This element specifies the input serialization parameters when the CSV object is queried. Child nodes: FileHeaderInfo, RecordDelimiter, FieldDelimiter, QuoteCharacter, CommentCharacter, and Range Parent nodes: InputSerialization |
CSV(OutputSerialization) | Container | Optional. This element specifies the output serialization parameters when the CSV object is queried. Child nodes: RecordDelimiter and FieldDelimiter Parent nodes: OutputSerialization |
JSON(InputSerialization) | Container | Optional. This element specifies the input serialization parameters when the JSON object is queried. Child nodes: Type, Range, and ParseJsonNumberAsString |
JSON(OutputSerialization) | Container | Optional. This element specifies the output serialization parameters when the JSON object is queried. Child nodes: RecordDelimiter |
Type | Enumeration | The type of the input JSON object. Valid values: DOCUMENT and LINES. |
OutputRawData | Boolean | Optional. This element specifies whether to export raw data. Default value: false. Child nodes: none Parent nodes: OutputSerialization Note
|
CompressionType | Enumeration | The compression type of the object. Valid value: None and GZIP. Child nodes: none Parent nodes: InputSerialization |
FileHeaderInfo | Enumeration | Optional. This element specifies the header information about the CSV object. Valid values:
Child nodes: none Parent nodes: CSV (input) |
RecordDelimiter | String | Optional. This element specifies a Base64-encoded line feed. Default value: Child nodes: none Parent nodes: CSV (input and output) and JSON (output) |
FieldDelimiter | String | Optional. This element specifies the delimiter that you want to use to separate columns in the CSV object. The value of this element must be Base64-encoded. Default value: Child nodes: none Parent nodes: CSV (input and output) |
QuoteCharacter | String | Optional. This element specifies a Base64-encoded quote character that you want to use in the CSV object. Default value: Child nodes: none Parent nodes: CSV (input) |
CommentCharacter | String | The comment character that you want to use in the CSV object. The value of this element must be Base64-encoded. This element is empty by default. |
Range | String | Optional. This element specifies the query range. The following query methods are supported: Note SelectMeta must be created for objects that are queried based on Range. For more information about SelectMeta, see CreateSelectObjectMeta.
The start and end parameters are inclusive. The two preceding parameters use the same format as that of the range parameter in range get. This parameter can be used only if the object is in the CSV format or if the JSON type is LINES. Child nodes: none Parent nodes: CSV (input) and JSON (output) |
KeepAllColumns | bool | Optional. This element specifies whether all columns in the CSV object are included in the response. Default value: false. Only columns included in the SELECT clause contain values. The columns in the response are sorted in ascending order of the column numbers. Example:
If you set KeepAllColumns to true and six columns are included in the CSV object, the following result is returned for the preceding SELECT clause: Value of 1st column,,,,Value of 5th column,\n Child nodes: none Parent nodes: OutputSerialization (CSV) |
EnablePayloadCrc | bool | The CRC-32 value for verification of each frame. The client can calculate the CRC-32 value of each payload and compare it with the included CRC-32 value to verify data integrity. Child nodes: none Parent nodes: OutputSerialization |
Options | Container | Other optional parameters. Child nodes: SkipPartialDataRecord and MaxSkippedRecordsAllowed Parent nodes: SelectRequest |
OutputHeader | bool | Specifies whether the header information about the CSV object is included in the beginning of the response. Default value: false. Child nodes: none Parent nodes: OutputSerialization |
SkipPartialDataRecord | bool | Specifies whether to ignore rows in which data is missing. If this parameter is set to false, OSS processes the row data as null without reporting errors. If this parameter is set to true, rows that do not contain data are skipped. If the number of skipped rows has exceeded the maximum number of rows that can be skipped, OSS reports an error and stops processing the data. Default value: false. Child nodes: none Parent nodes: Options |
MaxSkippedRecordsAllowed | Int | The maximum number of rows that can be skipped. If a row does not match the type specified in the SQL statement, or if one or more columns in a row are missing and the value of SkipPartialDataRecord is true, the rows are skipped. If the number of skipped rows has exceeded the value of this parameter, OSS reports an error and stops processing the data. Note If a row in a CSV object is not correctly formatted, OSS stops processing the data and reports an error because this format error may result in incorrect parsing of the CSV object. For example, a column in the row includes continual odd numbered quote characters. This parameter can be used to modify the tolerance for irregular data but cannot be configured for invalid CSV objects. Default value: 0. Child nodes: none Parent nodes: Options |
ParseJsonNumberAsString | bool | Specifies whether to parse integer and floating-point numbers in the JSON object into strings. The precision of floating-point numbers in a JSON object decreases when the numbers are parsed. If you want to retain the raw data, we recommend that you set this parameter to true. To use the parsed numbers in calculations, you can use the CAST function in SQL to convert the parsed data into the required type such as INT, DOUBLE, or DECIMAL. Default value: false. Child nodes: none Parent nodes: JSON |
AllowQuotedRecordDelimiter | bool | Specifies whether the CSV object can contain line feeds in quotation marks ("). For example, if the value of a column is "abc\ndef" and \n is a line feed, set this parameter to true. If this parameter is set to false, you can call the SelectObject operation to specify a range in the request header to perform more efficient multipart queries. Default value: true. Child nodes: none Parent nodes: InputSerialization |
Response body
If the HTTP status code included in the response is 4xx, the request failed to pass the SQL syntax check or the request contains errors. In this case, the format of the body of the returned error message is the same as that of the error message returned for a GetObject request.
If the HTTP status code included in the response is 5xx, internal server errors occur. In this case, the format of the body of the returned error message is the same as that of the error message returned for a GetObject request.
HTTP status code 206 is returned if the operation is successful.
If the value of header x-oss-select-output-raw is true, the object data except for frame-based data is returned. The client can obtain the data in the same manner as the GetObject operation.
If the value of x-oss-select-output-raw is false, the result is returned as frames.
The frames are returned in the
Version|Frame-Type | Payload Length | Header Checksum | Payload | Payload Checksum<1 byte><--3 bytes--><---4 bytes----><-------4 bytes--><variable><----4bytes---------->
format.NoteThe value of Checksum in frames is CRC-32. All integers in a frame are big-endian. Currently, the value of Version is 1.
Frame Type
The following table describes three frame types supported by SelectObject.
Frame type | Value | Payload format | Description |
Data Frame | 8388609 | offset | data<-8 bytes><---variable-> | The data returned for the SelectObject request. The value of the offset parameter is an 8-bit integer that indicates the current scanning location (the offset from the file header). This parameter is used to report the progress of the operation. |
Continuous Frame | 8388612 | offset<----8 bytes--> | The frame used to report the progress of an operation and maintain an HTTP connection. If no data is returned for a query request within 5 seconds, a continuous frame is returned. |
End Frame | 8388613 | offset | total scanned bytes | http status code | error message<--8bytes-><--8bytes---------><----4 bytes--------><-variable------> | An end frame is used to return the final state of an operation, including the scanned bytes and the possible error messages.
|
Sample requests
The following sample requests are used to call SelectObject for CSV and JSON objects:
Sample requests for CSV objects
POST /oss-select/bigcsv_normal.csv?x-oss-process=csv%2Fselect HTTP/1.1 Date: Fri, 25 May 2018 22:11:39 GMT Content-Type: Authorization: OSS qn6q**************:77Dv**************** User-Agent: aliyun-sdk-dotnet/2.8.0.0(windows 16.7/16.7.0.0/x86;4.0.30319.42000) Content-Length: 748 Expect: 100-continue Connection: keep-alive Host: host name <?xml version="1.0"?> <SelectRequest> <Expression>c2VsZWN0IGNvdW50KCopIGZyb20gb3Nzb2JqZWN0IHdoZXJlIF80ID4gNDU= </Expression> <InputSerialization> <Compression>None</Compression> <CSV> <FileHeaderInfo>Ignore</FileHeaderInfo> <RecordDelimiter>Cg==</RecordDelimiter> <FieldDelimiter>LA==</FieldDelimiter> <QuoteCharacter>Ig==</QuoteCharacter> <CommentCharacter>Iw==</CommentCharacter/> </CSV> </InputSerialization> <OutputSerialization> <CSV> <RecordDelimiter>Cg==</RecordDelimiter> <FieldDelimiter>LA==</FieldDelimiter> <QuoteCharacter>Ig==</QuoteCharacter> </CSV> <KeepAllColumns>false</KeepAllColumns> <OutputRawData>false</OutputRawData> </OutputSerialization> </SelectRequest>
Sample requests for JSON objects
POST /oss-select/sample_json.json?x-oss-process=json%2Fselect HTTP/1.1 Host: host name Accept-Encoding: identity User-Agent: aliyun-sdk-python/2.6.0(Darwin/16.7.0/x86_64;3.5.4) Accept: */* Connection: keep-alive date: Mon, 10 Dec 2018 18:28:11 GMT authorization: OSS qn6q**************:77Dv**************** Content-Length: 317 <SelectRequest> <Expression>c2VsZWN0ICogZnJvbSBvc3NvYmplY3Qub2JqZWN0c1sqXSB3aGVyZSBwYXJ0eSA9ICdEZW1vY3JhdCc= </Expression> <InputSerialization> <JSON> <Type>DOCUMENT</Type> </JSON> </InputSerialization> <OutputSerialization> <JSON> <RecordDelimiter>LA==</RecordDelimiter> </JSON> </OutputSerialization> <Options /> </SelectRequest>
Regular expressions in an SQL statement
In this example, SELECT select-list from table where_opt limit_opt
indicates a regular expression in an SQL statement.
The following keywords cannot be changed: SELECT and where.
select_list: column name
| column index (Example: _1, _2. column index applies only to CSV objects.)
| json path (Example: s.contacts.firstname. json path applies only to JSON objects.)
| function(column index | column name)
| function(json_path) (applies only to JSON objects.)
| select_list AS alias
The following functions are supported: AVG, SUM, MAX, MIN, COUNT, and CAST (type conversion function). You can specify only an asterisk (*) after COUNT.
table: OSSOBJECT
| OSSOBJECT json_path (applies only to JSON objects.)
If you want to perform operations on CSV objects, you must use the OSSOBJECT table. If you want to perform operations on JSON objects including DOCUMENT and LINES type objects, you can specify json_path after OSSOBJECT.
json_path: ['string '] (The quotation marks that are used to enclose a string can be deleted if the string does not include a space or an asterisk (*). In this case, ['string '] is equivalent to .'string '.)
| [n] (indicates the nth element in an array. The value of n is counted from 0.)
| [*] (indicates a child element in an array or object.)
| .'string ' (The quotation marks that are used to enclose a string can be deleted if the string does not include a space or an asterisk (*).)
| json_path jsonpath (You can concatenate multiple elements in a json path such as [n].property1.attributes[*].)
Where_opt:
| WHERE expr
expr:
| literal value
| column name
| column index
| json path (applies only to JSON objects.)
| expr op expr
| expr OR expr
| expr AND expr
| expr IS NULL
| expr IS NOT NULL
| (column name | column index | json path) IN (value1, value2, ...)
| (column name | column index | json path) NOT in (value1, value2, ...)
| (column name | column index | json path) between value1 and value2
| NOT (expr)
| expr op expr
| (expr)
| cast (column index |column name | json path | literal as INT|DOUBLE|)
op: includes the following operators:
>
,<
,>=
,<=
,! =
,=
,,
,LIKE
,+
,-
,*
,/
,%
, and||
.cast: You can use the CAST function to convert data in a column from one type to another type.
The combination of an aggregation function and limit:
Select avg(cast(_1 as int)) from ossobject limit 100
. The preceding statement calculates the average value of the first column in the first 100 rows. This function differs from the statement supported by MySQL because only a single row is returned for an aggregation function in SelectObject operations. Therefore, no limits are configured for the output data size. The limit operation is performed before the aggregation operation when you call SelectObject.
Limits on SQL statements
The following limits apply to SQL statements:
Only text objects encoded in UTF-8 and UTF-8 text objects compressed in the GZIP format are supported. The deflate format is not supported for GZIP objects.
Only a single object can be queried when you use an SQL statement. The following clauses are not supported: JOIN, ORDER BY, GROUP BY, and HAVING.
A WHERE clause cannot include aggregation conditions. For example, the following clause is not allowed: WHERE max(cast(age as int)) > 100.
A maximum of 1,000 columns can be specified for an SQL statement. The column name in an SQL statement can be up to 1,024 bytes in length.
A maximum of five wildcards (%) are supported in a LIKE clause. The percent sign (%) and the asterisk (*) are wildcards that indicate zero or more characters. The ESCAPE keyword is supported for SQL LIKE clauses, and is used to convert special characters such as percent signs (%), asterisks (*), and question marks (?) into normal strings.
A maximum of 1,024 constants are supported in an IN clause.
The Projection that is specified after SELECT can be a column name, a CSV column index (such as _1 or _2), an aggregation function, or a CAST function. Other expressions are not supported, such as select _1 + _2 from ossobject.
The maximum column size and row size for a CSV object are 256 KB.
JSON paths that are specified after FROM support JSON nodes whose maximum size is 512 KB. The path can contain up to 10 levels, and an array can contain up to 5,000 elements. The fields that are specified after SELECT and WHERE must be from the nodes corresponding to JSON paths that are specified after FROM.
In SQL statements of a JSON object, the SELECT or WHERE expressions cannot include array wildcards ([*]). Array wildcards ([*]) can be included only in JSON paths that are specified after FROM. For example, you can use select * from ossobject.contacts[*] instead of select s.contacts[*] from ossobject s.
The maximum size of an SQL statement is 16 KB. Up to 20 expressions can be added after WHERE. Each statement supports up to 10 levels and 100 aggregation operations.
Data error handling
The following section lists common methods that are used to handle data errors.
Some columns are missing in some rows in a CSV object.
If SkipPartialDataRecord is not specified or is set to false, OSS calculates the expressions in the SQL statement by processing the values of the missing columns as null.
If SkipPartialDataRecord is set to true, OSS ignores the rows in which some columns are missing. In this case, if MaxSkippedRecordsAllowed is not specified or is set to a value smaller than the number of skipped rows, OSS reports an error by sending HTTP status code 400 or including HTTP status code 400 in the end frame.
Assume that the SQL statement
select _1, _3 from ossobject
is executed, and that the data in a row of the CSV object is "John, Company A".If SkipPartialDataRecord is set to false, "John,\n" is returned.
If SkipPartialDataRecord is set to true, this row is skipped.
Some keys are missing in a JSON object.
Some JSON objects may exclude the keys specified in the SQL statement.
If SkipPartialDataRecord is not specified or is set to false, OSS calculates the expressions in the SQL statement by processing the missing keys as null.
If SkipPartialDataRecord is set to true, OSS skips the data in the JSON node. In this case, if MaxSkippedRecordsAllowed is not specified or is set to a value smaller than the number of skipped rows, OSS reports an error by sending HTTP status code 400 or including HTTP status code 400 in the end frame.
Assume that the SQL statement
select s.firstName, s.lastName , s.age from ossobject.contacts[*] s
is executed, and that the value of a JSON node is {"firstName":"John", "lastName":"Smith"}.If SkipPartialDataRecord is not specified or is set to false, {"firstName":"John", "lastName":"Smith"} is returned.
If SkipPartialDataRecord is set to true, this row is skipped.
NoteFor keys in the returned data of a request for JSON objects, the output JSON objects can only be LINES. The Key value in the output is returned based on the following rules:
Assume that the SQL statement
select * from ossobject…
is executed. If * corresponds to a JSON object ({...}), the JSON object is returned. If * corresponds to a string or an array, the string or array is returned as a DummyKey _1.If {"Age":5}
select * from ossobject.Age s where s = 5
is used, {"_1":5} is returned because the value 5 corresponding to * is not a JSON object. When the SQL statementselect * from ossobject s where s.Age = 5
is executed, {"Age":5} corresponding to * is returned.If the SQL statement does not use select * but specifies columns, the returned content is in the
{"{Column 1}": Value, "{Column 2}": Value...}
format. {Column n} can be generated by using the following methods:If the alias of the column is specified in the SELECT clause, the alias takes effect.
If the column name is the key of a JSON object, this key is used as the output key value.
If the column is an element of a JSON array or an aggregate function, prefix the column name with the serial number (starting from 1) and underscore (_) as the output key value.
Assume that {"contacts":{"Age":35, "Children":["child1", "child2", "child3"]}} is used:
When the SQL statement
select s.contacts.Age, s.contacts.Children[0] from ossobjects
is executed, Age is the key of the input JSON object, and Children[0] indicates the first element of the Children array and is the second column in the output content. In this case, {"Age":35, "_2":"child1"} is returned.When the SQL statement
select max(cast(s.Age as int)) from ossobject.contacts s
is executed and the selected column is an aggregate function, the column is prefixed with _1 and its serial number in the output. In this case, {"_1":35} is returned.When the alias of the column is specified in the SQL statement
select s.contacts.Age, s.contacts.Children[0] as firstChild from ossobject
, {"Age":35, "firstChild":"child1"} is returned.
Keys that match the JSON objects and SQL statements are case-sensitive. For example, "select s. Age" and "select s. age" are different.
The data type of some columns in a CSV object does not match the data type that is specified in the SQL statement.
If the data type of a row in a CSV object does not match the type specified in the SQL statement, the row is skipped. If the number of skipped rows exceeds the value of MaxSkippedRecordsAllowed, OSS stops processing data and returns HTTP status code 400.
Assume that the SQL statement
select _1, _3 from ossobject where _3 > 5
is executed. If the value of a row in the CSV object isJohn, Company A, To be hired
, this row is skipped because the third column in the row is not of the integer type.The data type of some keys in a JSON object does not match the data type that is specified in the SQL statement.
Assume that the SQL statement
select s.name from ossobject s where s.aliren_age > 5
is executed. If the value of a JSON node is{"Name":"John", "Career_age": To be hired}
, this node is skipped.
CreateSelectObjectMeta
You can call the CreateSelectObjectMeta operation to obtain information about an object, such as the total number of rows and the number of splits. For a CSV object, you can also obtain the total number of columns in the object when you call this operation. An object consists of multiple data blocks. Each data block corresponds to a split. A split consists of consecutive rows. If the requested information does not exist in the CSV or JSON object, the entire object is scanned to analyze and record the preceding information. The information obtained in the first API operation call is used when the operation is called again. This way, the entire object does not need to be rescanned.
You must be granted write permissions on the object before you can perform the CreateSelectObjectMeta operation.
If the operation is successful, HTTP status code 200 is returned. If the object is not a valid CSV or JSON LINES object, or if the specified delimiter does not match existing CSV objects, HTTP status code 400 is returned.
Request syntax
Request syntax for CSV objects
POST /samplecsv?x-oss-process=csv/meta <CsvMetaRequest> <InputSerialization> <CompressionType>None</CompressionType> <CSV> <RecordDelimiter>Base64-encoded character</RecordDelimiter> <FieldDelimiter>Base64-encoded character</FieldDelimiter> <QuoteCharacter>Base64-encoded character</QuoteCharacter> </CSV> </InputSerialization> <OverwriteIfExists>false|true</OverwriteIfExists> </CsvMetaRequest>
Request syntax for JSON objects
POST /samplecsv?x-oss-process=json/meta <JsonMetaRequest> <InputSerialization> <CompressionType>None</CompressionType> <JSON> <Type>LINES</Type> </JSON> </InputSerialization> <OverwriteIfExists>false|true</OverwriteIfExists> </JsonMetaRequest>
Request elements
Element
Type
Description
CsvMetaRequest
Container
The container that stores the CreateSelectObjectMeta request for CSV objects.
Child nodes: InputSerialization
Parent nodes: none
JsonMetaRequest
Container
The container that stores the CreateSelectObjectMeta request for JSON objects.
Child nodes: InputSerialization
Parent nodes: none
InputSerialization
Container
Optional. This element specifies the input serialization parameters.
Child nodes: CompressionType, CSV, and JSON
Parent nodes: CsvMetaRequest and JsonMetaRequest
OverwriteIfExists
bool
Optional. This element specifies whether to recalculate the SelectMeta and overwrite the existing data. The value false indicates that the result is directly returned if the SelectMeta already exists. Default value: false.
Child nodes: none
Parent nodes: CsvMetaRequest and JsonMetaRequest
CompressionType
Enumeration
Optional. This element specifies the compression type of the object. Only None is supported.
Child nodes: none
Parent nodes: InputSerialization
RecordDelimiter
String
Optional. This element specifies a Base64-encoded line feed that you want to use to separate rows in the CSV object. Default value: \n. Before the value of this element is encoded, the value must be an ANSI value of up to two characters in length. For example, \n is used to indicate a line feed in Java.
Child nodes: none
Parent nodes: CSV
FieldDelimiter
String
Optional. This element specifies the delimiter that you want to use to separate columns in the CSV object. The value of this element must be Base64-encoded. Default value:
,
.Before the value of this element is encoded, the value must be an ANSI value of one character in length. For example,
,
is used to indicate a comma in Java.Child nodes: none
Parent nodes: CSV (input and output)
QuoteCharacter
String
Optional. This element specifies a Base64-encoded quote character that you want to use in the CSV object. Default value: \" In a CSV object, line feeds and column delimiters enclosed in quotation marks are processed as normal characters. Before the value of this element is encoded, the value must be an ANSI value of one character in length. For example, \" is used to indicate a quote character in Java.
Child nodes: none
Parent nodes: CSV (input)
CSV
Container
The format of the input CSV object.
Child nodes: RecordDelimiter, FieldDelimiter, and QuoteCharacter
Parent nodes: InputSerialization
JSON
Container
The format of the input JSON object.
Child nodes: Type
Parent nodes: InputSerialization
Type
Enumeration
The type of the JSON object.
Valid value: LINES
Response body
Similar to SelectObject, the results of CreateSelectObjectMeta are also returned as frames.
Frame type
Value
Payload format
Description
Meta End Frame (CSV)
8388614
offset | total scanned bytes | status| splits count | rows count | columns count | error message
<-8 bytes><------8 bytes------><--4bytes><--4 bytes--><--8 bytes><--4 bytes---><variable size>
The frame used to report the final state of a CreateSelectObjectMeta operation.
offset: an 8-bit integer that indicates the offset when scanning is complete.
total scanned bytes: an 8-bit integer that indicates the size of the scanned data.
status: a 4-bit integer that indicates the final state of the operation.
splits_count: a 4-bit integer that indicates the total number of splits.
rows_count: an 8-bit integer that indicates the total number of rows.
cols_count: a 4-bit integer that indicates the total number of columns.
error_message: the detailed error message that is returned. If no errors occur, the value of this parameter is null.
Meta End Frame (JSON)
8388615
offset | total scanned bytes | status| splits count | rows count | error message
<-8 bytes><------8 bytes------><--4bytes><--4 bytes--><--8 bytes><variable size>
The frame used to report the final state of a CreateSelectObjectMeta operation.
offset: an 8-bit integer that indicates the offset when scanning is complete.
total scanned bytes: an 8-bit integer that indicates the size of the scanned data.
status: a 4-bit integer that indicates the final state of the operation.
splits_count: a 4-bit integer that indicates the total number of splits.
rows_count: an 8-bit integer that indicates the total number of rows.
error_message: the detailed error message that is returned. If no errors occur, the value of this parameter is null.
Sample requests
Sample requests for CSV objects
POST /oss-select/bigcsv_normal.csv?x-oss-process=csv%2Fmeta HTTP/1.1 Date: Fri, 25 May 2018 23:06:41 GMT Content-Type: Authorization: OSS qn6q**************:77Dv**************** User-Agent: aliyun-sdk-dotnet/2.8.0.0(windows 16.7/16.7.0.0/x86;4.0.30319.42000) Content-Length: 309 Expect: 100-continue Connection: keep-alive Host: Host <?xml version="1.0"?> <CsvMetaRequest> <InputSerialization> <CSV> <RecordDelimiter>Cg==</RecordDelimiter> <FieldDelimiter>LA==</FieldDelimiter> <QuoteCharacter>Ig==</QuoteCharacter> </CSV> </InputSerialization> <OverwriteIfExisting>false</OverwriteIfExisting> </CsvMetaRequest>
Sample requests for JSON objects
POST /oss-select/sample.json?x-oss-process=json%2Fmeta HTTP/1.1 Date: Fri, 25 May 2018 23:06:41 GMT Content-Type: Authorization: OSS qn6q**************:77Dv**************** User-Agent: aliyun-sdk-dotnet/2.8.0.0(windows 16.7/16.7.0.0/x86;4.0.30319.42000) Content-Length: 309 Expect: 100-continue Connection: keep-alive Host: Host <?xml version="1.0"?> <JsonMetaRequest> <InputSerialization> <JSON> <Type>LINES</Type> </JSON> </InputSerialization> <OverwriteIfExisting>false</OverwriteIfExisting> </JsonMetaRequest>
Supported time formats
The following table lists the formats that can be converted to a timestamp without the need to specify the time format. For example, the cast('20121201' as timestamp) string is automatically parsed as a timestamp of December 1, 2012.
Format | Description |
YYYYMMDD | year month day |
YYYY/MM/DD | year/month/day |
DD/MM/YYYY/ | day/month/year |
YYYY-MM-DD | year-month-day |
DD-MM-YY | day-month-year |
DD.MM.YY | day. month. year |
HH:MM:SS.mss | hour:minute:second. millisecond |
HH:MM:SS | hour:minute:second |
HH MM SS mss | hour minute second millisecond |
HH.MM.SS.mss | hour. minute. second. millisecond |
HHMM | hour minute |
HHMMSSmss | hour minute second millisecond |
YYYYMMDD HH:MM:SS.mss | year month day hour:minute:second. millisecond |
YYYY/MM/DD HH:MM:SS.mss | year/month/day hour:minute:second. millisecond |
DD/MM/YYYY HH:MM:SS.mss | day/month/year hour:minute:second. millisecond |
YYYYMMDD HH:MM:SS | year month day hour:minute:second |
YYYY/MM/DD HH:MM:SS | year/month/day hour:minute:second |
DD/MM/YYYY HH:MM:SS | day/month/year hour:minute:second |
YYYY-MM-DD HH:MM:SS.mss | year-month-day hour:minute:second. millisecond |
DD-MM-YYYY HH:MM:SS.mss | day-month-year hour:minute:second. millisecond |
YYYY-MM-DD HH:MM:SS | year-month-day hour:minute:second |
YYYYMMDDTHH:MM:SS | year month day T hour:minute:second |
YYYYMMDDTHH:MM:SS.mss | year month day T hour:minute:second. millisecond |
DD-MM-YYYYTHH:MM:SS.mss | day-month-year T hour:minute:second. millisecond |
DD-MM-YYYYTHH:MM:SS | day-month-year T hour:minute:second |
YYYYMMDDTHHMM | year month day T hour minute |
YYYYMMDDTHHMMSS | year month day T hour minute second |
YYYYMMDDTHHMMSSMSS | year month day T hour minute second millisecond |
ISO8601-0 | year-month-day T hour:minute+hour:minute, or year-month-day T hour: minute-hour:minute "+" indicates that the local time in the current time zone is later than the standard UTC time. "-" indicates that the local time in the current time zone is earlier than the standard UTC time. In this format, ISO8601-0 can be used. |
ISO8601-1 | year-month-day T hour:minute+hour:minute, or year-month-day T hour: minute-hour:minute "+" indicates that the local time in the current time zone is later than the standard UTC time. "-" indicates that the local time in the current time zone is earlier than the standard UTC time. In this format, ISO 8601-1 can be used. |
CommonLog | Example: 28/Feb/2017:12:30:51 +0700 |
RFC822 | Example: Tue, 28 Feb 2017 12:30:51 GMT |
?D/?M/YY | day/month/year, in which the day and month can be one or two digits. |
?D/?M/YY ?H:?M | day/month/year/hour:minute, in which the day, month, hour, and minute can be one or two digits. |
?D/?M/YY ?H:?M:?S | day/month/year/hour:minute:second, in which the day, month, hour, minute, and second can be one or two digits. |
The following table lists the formats that may cause errors. You must specify a time format when you use strings in these formats. For example, the cast('20121201' as timestamp format 'YYYYDDMM') statement incorrectly parses the string 20121201 as January 12, 2012.
Format | Description |
YYYYDDMM | year day month |
YYYY/DD/MM | year/day/month |
MM/DD/YYYY | month/day/year |
YYYY-DD-MM | year-day-month |
MM-DD-YYYY | month-day-year |
MM.DD.YYYY | month. day. year |
ErrorCode
SelectObject returns error codes by using the following methods:
The HTTP status code is included in the response header and the error code is included in the response body, which is the same as other OSS requests. Error codes returned in this manner indicate that input errors such as an invalid SQL statement or data errors occur.
The error code is included in the end frame of the response body. Error codes returned in this manner indicate that the data is incorrect or does not match the data type that is specified in the SQL statement. For example, a string exists in a column for which the type is set to integer in the SQL statement. In this case, part of data is processed, and then HTTP status code 206 and the processed data are sent to the client.
Error codes such as InvalidCSVLine can be returned as an HTTP status code in the response header or as a status code in the end frame based on the location of the error row within the CSV object.
ErrorCode | Description | HTTP Status Code | Http Status Code in End Frame |
InvalidSqlParameter | The error message returned because the specified SQL parameter does not exist. The SQL statement in the request is null, the size of the SQL statement has exceeded the upper limit, or the SQL statement is not Base64-encoded. | 400 | None |
InvalidInputFieldDelimiter | The error message returned because the input CSV object contains invalid column delimiters. The parameter is not Base64-encoded, or the size of the parameter is greater than 1 byte after the parameter is decoded. | 400 | None |
InvalidInputRecordDelimiter | The error message returned because the input CSV object contains invalid row delimiters. The parameter is not Base64-encoded, or the size of the parameter is greater than 2 bytes after the parameter is decoded. | 400 | None |
InvalidInputQuote | The error message returned because the input CSV object contains invalid quote characters. The parameter is not Base64-encoded, or the size of the parameter is greater than 1 byte after the parameter is decoded. | 400 | None |
InvalidOutputFieldDelimiter | The error message returned because the output CSV object contains invalid column delimiters. The parameter is not Base64-encoded, or the size of the parameter is greater than 1 byte after the parameter is decoded. | 400 | None |
InvalidOutputRecordDelimiter | The error message returned because the output CSV object contains invalid row delimiters. The parameter is not Base64-encoded, or the size of the parameter is greater than 2 bytes after the parameter is decoded. | 400 | None |
UnsupportedCompressionFormat | The error message returned because the value of the Compression parameter is not NONE or GZIP. The value is case-insensitive. | 400 | None |
InvalidCommentCharacter | The error message returned because the CSV object contains invalid comment characters. The parameter is not Base64-encoded, or the size of the parameter is greater than 1 byte after the parameter is decoded. | 400 | None |
InvalidRange | The error message returned because the Range parameter is not prefixed with line-range= or split-range=, or the range value does not comply with the HTTP standard for Range. | 400 | None |
DecompressFailure | The error message returned because the value of Compression is GZIP and the object cannot be extracted. | 400 | None |
InvalidMaxSkippedRecordsAllowed | The error message returned because the value of MaxSkippedRecordsAllowed is not an integer. | 400 | None |
SelectCsvMetaUnavailable | The error message returned because the object does not include CSV Meta when the Range parameter is specified. Call the CreateSelectObjectMeta operation first. | 400 | None |
InvalidTextEncoding | The error message returned because the object is not UTF-8 encoded. | 400 | None |
InvalidOSSSelectParameters | The error message returned because the EnablePayloadCrc and OutputRawData parameters are set to true. This results in conflicts. | 400 | None |
InternalError | The error message returned because an OSS system error has occurred. | 500 or 206 | 500 or None |
SqlSyntaxError | The error message returned because the syntax of the Base64-decoded SQL statement is invalid. | 400 | None |
SqlExceedsMaxInCount | The error message returned because the number of values included in the SQL IN clause has exceeded 1,024. | 400 | None |
SqlExceedsMaxColumnNameLength | The error message returned because the size of the column name has exceeded 1,024 bytes. | 400 | None |
SqlInvalidColumnIndex | The error message returned because the column index in the SQL statement is smaller than 1 byte or greater than 1,000 bytes in length. | 400 | None |
SqlAggregationOnNonNumericType | The error message returned because an aggregation function is used in a non-numeric column. | 400 | None |
SqlInvalidAggregationOnTimestamp | The error message returned because the SUM or AVG aggregation function is used in the timestamp column. | 400 | None |
SqlValueTypeOfInMustBeSame | The error message returned because values of different types are included in the SQL IN clause. | 400 | None |
SqlInvalidEscapeChar | The error message returned because an invalid escape character such as a question mark (?), a percent sign (%), or an asterisk (*) is specified in the SQL LIKE clause. | 400 | None |
SqlOnlyOneEscapeCharIsAllowed | The error message returned because the size of the escape character in the SQL LIKE clause is greater than 1 byte in length. | 400 | None |
SqlNoCharAfterEscapeChar | The error message returned because no characters are specified after the escape character in the SQL LIKE clause. | 400 | None |
SqlInvalidLimitValue | The error message returned because the number that is specified after the SQL Limit clause is smaller than 1. | 400 | None |
SqlExceedsMaxWildCardCount | The error message returned because the number of asterisks (*) or percent signs (%) in the SQL LIKE clause has exceeded the upper limit. | 400 | None |
SqlExceedsMaxConditionCount | The error message returned because the number of conditional expressions in the SQL WHERE clause has exceeded the upper limit. | 400 | None |
SqlExceedsMaxConditionDepth | The error message returned because the depth of the conditional tree in the SQL WHERE clause has exceeded the upper limit. | 400 | None |
SqlOneColumnCastToDifferentTypes | The error message returned because a column is converted into different types by including the CAST function in the SQL statement. | 400 | None |
SqlOperationAppliedToDifferentTypes | The error message returned because an operator is used for two objects of different types in the SQL statement. For example, this error code is returned if col1 in _col1 > 3 is a string. | 400 | None |
SqlInvalidColumnName | The error message returned because a column name used in the SQL statement is not included in the header of the CSV object. | 400 | None |
SqlNotSupportedTimestampFormat | The error message returned because the timestamp format specified in the SQL CAST clause is not supported. | 400 | None |
SqlNotMatchTimestampFormat | The error message returned because the timestamp format specified in the SQL CAST clause does not match the timestamp string. | 400 | None |
SqlInvalidTimestampValue | The error message returned because no timestamp formats are specified in the SQL CAST clause and the specified string cannot be converted to a timestamp. | 400 | None |
SqlInvalidLikeOperand | The error message returned because column names or indexes on the left side are not specified in the SQL LIKE clause, the specified column on the left side is not of the string type, or the column on the right side in the LIKE clause is of the string type. | 400 | None |
SqlInvalidMixOfAggregationAndColumn | The error message returned because the SQL SELECT clause includes column names and indexes for both aggregation functions and non-aggregation functions. | 400 | None |
SqlExceedsMaxAggregationCount | The error message returned because the number of aggregation functions included in the SQL SELECT clause has exceeded the upper limit. | 400 | None |
SqlInvalidMixOfStarAndColumn | The error message returned because an asterisk (*), a column name, and a column index are included in the same SQL statement. | 400 | None |
SqlInvalidKeepAllColumnsWithAggregation | The error message returned because the SQL statement includes aggregation functions and the KeepAllColumns parameter is set to true. | 400 | None |
SqlInvalidKeepAllColumnsWithDuplicateColumn | The error message returned because the SQL statement includes repeated column names or column indexes and the KeepAllColumns parameter is set to true. | 400 | None |
SqlInvalidSqlAfterAnalysis | The error message returned because the SQL statement is complex and the parsed SQL statement is not supported. | 400 | None |
InvalidArithmeticOperand | The error message returned because the SQL statement contains arithmetic operations performed on non-numeric constants or columns. | 400 | None |
SqlInvalidAndOperand | The error message returned because the expressions joined by the AND operator in the SQL statement are not of the Boolean type. | 400 | None |
SqlInvalidOrOperand | The error message returned because the expressions joined by the OR operator in the SQL statement are not of the Boolean type. | 400 | None |
SqlInvalidNotOperand | The error message returned because the expressions joined by the NOT operator in the SQL statement are not of the Boolean type. | 400 | None |
SqlInvalidIsNullOperand | The error message returned because the SQL statement specifies the IS NULL operator-based operations performed on a constant. | 400 | None |
SqlComparerOperandTypeMismatch | The error message returned because the SQL statement specifies the comparison operator-based operations performed on two objects of different types. | 400 | None |
SqlInvalidConcatOperand | The error message returned because the SQL statement contains two constants joined by the concatenation operator (||). | 400 | None |
SqlUnsupportedSql | The error message returned because the SQL statement is complex and the size of the generated SQL plan has exceeded the upper limit. | 400 | None |
HeaderInfoExceedsMaxSize | The error message returned because the size of the header information specified in the SQL statement has exceeded the upper limit. | 400 | None |
OutputExceedsMaxSize | The error message returned because the size of a row in the output has exceeded the upper limit. | 400 | None |
InvalidCsvLine | The error message returned because a row in the CSV object is invalid or the size of the row has exceeded the upper limit, or because the number of skipped rows has exceeded the value of MaxSkippedRecordsAllowed. | 400 or 206 | 400 or None |
NegativeRowIndex | The error message returned because the value of the array index in the SQL statement is a negative number. | 400 | None |
ExceedsMaxNestedColumnDepth | The error message returned because the number of nested levels of the JSON object in the SQL statement has exceeded the upper limit. | 400 | None |
NestedColumnNotSupportInCsv | The error message returned because the SQL statement contains nested columns that include periods (.) or arrays that include brackets ([]). The preceding characters are not supported for SQL statements of CSV objects. | 400 | None |
TableRootNodeOnlySupportInJson | The error message returned because the root node path is not specified after From ossobject in JSON objects. | 400 | None |
JsonNodeExceedsMaxSize | The error message returned because the size of the root node in the JSON object has exceeded the upper limit. | 400 or 206 | 400 or None |
InvalidJsonData | The error message returned because the JSON data is incorrectly formatted. | 400 or 206 | 400 or None |
ExceedsMaxJsonArraySize | The error message returned because the number of elements in an array in the root node of the JSON object has exceeded the upper limit. | 400 or 206 | 400 or None |
WildCardNotAllowed | The error message returned because asterisks (*) cannot be used in SQL SELECT clauses or SQL WHERE clauses for the JSON object. For example, an error is returned if you execute the following statement: | 400 | None |
JsonNodeExceedsMaxDepth | The error message returned because the depth of the root node of the JSON object has exceeded the upper limit. | 400 or 206 | 400 or None |