Data Lake Analytics (DLA) has built-in serializer/deserializer (SerDe) libraries to process data files in different formats. You can directly select one or more SerDe libraries to match the formats of data files in Object Storage Service (OSS) without the need to write programs. DLA can use SerDe libraries to query and analyze data of OSS files in different formats. The formats include TXT (CSV and TSV), ORC, Parquet, JSON, RCFile, and AVRO.
When you create an OSS external table in DLA, you need to use a STORED AS
clause in the table creation statement to specify the format of the data file in
OSS.
For example, the STORED AS clause in the following statement specifies that the file format is TXT.
CREATE EXTERNAL TABLE nation (
N_NATIONKEY INT,
N_NAME STRING,
N_REGIONKEY INT,
N_COMMENT STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://test-bucket-julian-1/tpch_100m/nation';
After the OSS external table is created, you can execute the SHOW CREATE TABLE
statement to view the table creation statement.
show create table nation;
+-------------------------------+
| Result |
+-------------------------------+
| CREATE EXTERNAL TABLE `nation`(
`n_nationkey` int,
`n_name` string,
`n_regionkey` int,
`n_comment` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS `TEXTFILE`
LOCATION'oss://bucket-name/tpch_100m/nation'|
The following table lists the STORED AS clauses supported by DLA. When you create
an external table, you can specify the STORED AS
clause in the table creation statement. Then, DLA automatically selects an appropriate
SerDe library, input format, and output format for the table that you want to create.
STORED AS TEXTFILE | An external table is stored as a TXT file. By default, the file is in the TXT format.
Each row in a file corresponds to a record in the table. |
STORED AS PARQUET | An external table is stored as a Parquet file. |
STORED AS ORC | An external table is stored as an ORC file. |
STORED AS RCFILE | An external table is stored as an RCFile file. |
STORED AS AVRO | An external table is stored as an AVRO file. |
STORED AS JSON | An external table is stored as a JSON file, except for the GeoJSON file of Esri ArcGIS. |
When you use a STORED AS
clause to specify the file format, you can also specify the SerDe libraries and special
column delimiters based on the file characteristics. For more information, see the
descriptions of different file formats.