Alibaba Cloud Data Lake Analytics (DLA) enables you to directly query and analyze data stored in Object Storage Service (OSS) and Table Store instances by using standard SQL statements.
Before querying data, you must create a table in DLA based on the format and content of the data file. This tutorial takes a file stored in an OSS instance as an example to explain how to specify LOCATION of a table.
In DLA, the syntax of a table creation statement is as follows:
CREATE EXTERNAL TABLE [IF NOT EXISTS] [db_name.]table_name
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[ROW FORMAT row_format]
[STORE AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
LOCATION oss_path
LOCATION can be set to the data file itself or to the directory where the data file is located.
When creating a table, you can directly associate the table with the data file. In this case, LOCATION must be set to the absolute path of the data file in the OSS instance.
For example:
CREATE EXTERNAL TABLE loc_file_csv(
N_NATIONKEY INT,
N_NAME STRING,
N_REGIONKEY INT,
N_COMMENT STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://my-bucket/datasets/test/test_create/create_table/csv/dir1/dir2/nation.csv';
If LOCATION is set to a directory in the OSS instance, all the files under this directory are the data files of the table.
When creating a table, you can set recursive.directories to determine whether to traverse all data files under this directory. If recursive.directories is set to the default value or is not explicitly specified, DLA resolves that recursive.directories is false, that is, the directory is not traversed.
For example, the directory structure in the OSS instance is as follows:
2018-07-05 11:16:11 1752.00B Standard oss://my-bucket/datasets/test/test_create/create_table/csv/dir1/dir2/nation.csv
2018-07-05 11:15:57 1752.00B Standard oss://my-bucket/datasets/test/test_create/create_table/csv/dir1/nation.csv
2018-07-05 11:16:17 1752.00B Standard oss://my-bucket/datasets/test/test_create/create_table/csv/nation.csv
The table creation statement is as follows:
CREATE EXTERNAL TABLE loc_file_csv(
N_NATIONKEY INT,
N_NAME STRING,
N_REGIONKEY INT,
N_COMMENT STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
'oss://my-bucket/datasets/test/test_create/create_table/csv'
TBLPROPERTIES ('recursive.directories' = 'false');
If recursive.directories is set to false, the data file of this table is the nation.csv file under LOCATION:
oss://my-bucket/datasets/test/test_create/create_table/csv/nation.csv
The table creation statement is as follows:
CREATE EXTERNAL TABLE loc_file_csv(
N_NATIONKEY INT,
N_NAME STRING,
N_REGIONKEY INT,
N_COMMENT STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
'oss://my-bucket/datasets/test/test_create/create_table/csv'
TBLPROPERTIES ('recursive.directories' = 'true');
If recursive.directories is set to true, the data files of this table are all the .csv files under LOCATION and all its subdirectories:
oss://my-bucket/datasets/test/test_create/create_table/csv/nation.csv
oss://my-bucket/datasets/test/test_create/create_table/csv/dir1/nation.csv
oss://my-bucket/datasets/test/test_create/create_table/csv/dir1/dir2/nation.csv
For example, if STORED AS TEXTFILE is specified in the table creation statement, but both .csv and .orc files are available under the directory, no error is reported during table creation and query. However, you may see data with garbled characters or incorrect data when running the SELECT statement.
To learn more about Alibaba Cloud Data Lake Analytics (DLA), visit www.alibabacloud.com/products/data-lake-analytics
2,599 posts | 762 followers
FollowAlibaba Clouder - November 9, 2018
Alibaba Clouder - November 12, 2018
Alibaba Cloud Data Lake Analytics - February 26, 2019
ApsaraDB - February 16, 2021
Alibaba Clouder - August 8, 2018
Alibaba Clouder - November 16, 2018
2,599 posts | 762 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreBuild a Data Lake with Alibaba Cloud Object Storage Service (OSS) with 99.9999999999% (12 9s) availability, 99.995% SLA, and high scalability
Learn MoreA real-time data warehouse for serving and analytics which is compatible with PostgreSQL.
Learn MoreMore Posts by Alibaba Clouder
Raja_KT March 1, 2019 at 8:33 am
Good one. I miss out the recursive setting :)