By Sheng Yuan
Unlike databases, MaxCompute SQL does not have any built-in paged query logic such as select * from table limit x offset y. However, many users hope to obtain query results in pages or batches in some scenarios by using the paged query logic of databases. This article describes several methods to implement this need.
1. Use the row_number() function as a unique and incrementing ID for filter-based query.
select * from (select row_number() over() as row_id,* from orders_delta)t where row_id between 10 and 20;
Sort the data and attach a unique ID to each data record by using the row_number() function. Then specify the pagination range for each query based on this ID.
2. Download the query result in batches by using the InstanceTunnel of Java SDK.
MaxCompute JavaSDK supports directly exporting the SELECT query result by using SQLTask and InstanceTunnel in combination. For more information, see Tunnel Commands.
In this case, you can download the results of your custom SELECT query by using InstanceTunnel.
Odps odps = OdpsUtils.newDefaultOdps(); // Initialize the MaxCompute object
Instance i = SQLTask.run(odps, "select * from wc_in;");
i.waitForSuccess();
// Create InstanceTunnel
InstanceTunnel tunnel = new InstanceTunnel(odps);
// Create DownloadSession based on the instance ID
InstanceTunnel.DownloadSession session = tunnel.createDownloadSession(odps.getDefaultProject(), i.getId());
long count = session.getRecordCount();
// The count of outputs results
System.out.println(count);
// The code for obtaining the data is the same as in Table Tunnel.
TunnelRecordReader reader = session. openRecordReader (0, Count );
Record record;
While(Record = reader.Read())! = null) {
For(IntCol =0; Col <session.GetSchema().GetColumns().Size(); ++ Col ){
// The fields in the wc_in table are all strings, and the output is printed directly here.
System. out.Println(Record.Get(Col ));
}
}
Reader.Close();
Here, a custom select query is submitted through SQLTask, and the query result is directly downloaded using InstanceTunnel.DownloadSession. Within this, the openRecordReader method supports specifying the start position and number of reads for this read record, and the parameter settings of openRecordReader (start, long) can be used to implement the logic for batch download.
For example, you can change openRecordReader (0, count) in the preceding example to get the start position and number of records you wish to obtain:
TunnelRecordReader reader = downloadSession. openRecordReader (100,20);
References: MaxCompute SDK Java Doc InstanceTunnel. DownloadSession class openRecordReader method.
Because the data volume of tables processed by MaxCompute is usually very large, the preceding method is generally not recommended for use in interactive paging query scenarios for reports. For users who need to query while interacting, consider using MaxCompute's interactive analysis service (Lightning) to make such queries.
3. Use the limit/offset syntax of MaxCompute's interactive analysis tool (Lightning) to implement paging
MaxCompute SQL does not support limit/offset syntax, but developers can draw on MaxCompute'sinteractive analysis tool (Lightning) to use limit/offset.
MaxCompute's interactive analysis tool can quickly query MaxCompute data using the PostgreSQL protocol and syntax under the same permission system. PostgreSQL's limit offset syntax can achieve the same paging effect as database queries.
Data Quality Management of Data Warehouses Based on MaxCompute
137 posts | 19 followers
FollowAlibaba Cloud MaxCompute - October 18, 2021
Alibaba Cloud MaxCompute - December 8, 2020
Alibaba Cloud MaxCompute - February 18, 2024
Alibaba Cloud MaxCompute - December 13, 2018
ApsaraDB - May 24, 2024
Alibaba Cloud MaxCompute - May 9, 2019
137 posts | 19 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreConduct large-scale data warehousing with MaxCompute
Learn MoreAlibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreMore Posts by Alibaba Cloud MaxCompute