This topic describes how to export SQL execution results of MaxCompute.
This topic provides examples on how to export SQL execution results of MaxCompute SDK for Java.
Overview
You can use the following methods to export the execution results of SQL statements:
If the amount of data is small, use SQLTask to obtain all query results.
If you want to export the query results of a table or a partition, use Tunnel.
If the SQL statements are complex, use Tunnel and SQLTask to export the query results.
Use DataWorks to execute SQL statements, synchronize data, perform timed scheduling, and configure task dependencies.
Use the open source tool DataX to export data from MaxCompute to specified destination data sources.
Use SQLTask to export data
You can use SQLTask that is provided by MaxCompute SDK for Java to call SQLTask.getResult(i) to execute SQL statements and obtain the query results. For more information, see SQLTask.
When you use SQLTask, take note of the following rules:
SQLTask.getResult(i) is used to export the execution results of SELECT statements. You cannot use it to export the execution results of other MaxCompute SQL statements such as SHOW TABLES.
You can configure the READ_TABLE_MAX_ROW parameter to specify the maximum number of data records that a SELECT statement returns to a client. For more information, see Project operations.
The SELECT statement returns a maximum of 10,000 data records to a client. You can execute the SELECT statement on a client such as SQLTask. This is equivalent to appending a LIMIT N clause to the SELECT statement.
Use Tunnel to export data
If a query returns all data of a table or a partition, you can use Tunnel to export the data. For more information, see Tunnel commands and MaxCompute Tunnel overview.
The following example shows how to run a Tunnel command to export data. If the Tunnel command cannot be used to export data, you can compile the Tunnel SDK to export data. For more information, see MaxCompute Tunnel overview.
tunnel d wc_out c:\wc_out.dat;
2016-12-16 19:32:08 - new session: 201612161932082d3c9b0a012f68e7 total lines: 3
2016-12-16 19:32:08 - file [0]: [0, 3), c:\wc_out.dat
downloading 3 records into 1 file
2016-12-16 19:32:08 - file [0] start
2016-12-16 19:32:08 - file [0] OK. total: 21 bytes
download OK
Use SQLTask and Tunnel to export data
Compared with Tunnel, SQLTask cannot be used to process more than 10,000 data records. You can use SQLTask together with Tunnel to export more than 10,000 data records.
The following sample code provides an example on how to use SQLTask together with Tunnel to export data:
Odps odps = OdpsUtils.newDefaultOdps(); // Initialize a MaxCompute object.
Instance i = SQLTask.run(odps, "select * from wc_in;");
i.waitForSuccess();
// Create an InstanceTunnel object.
InstanceTunnel tunnel = new InstanceTunnel(odps);
// Create a DownloadSession object based on the instance ID of the MaxCompute job.
InstanceTunnel.DownloadSession session = tunnel.createDownloadSession(odps.getDefaultProject(), i.getId());
long count = session.getRecordCount();
// The number of records that are returned.
System.out.println(count);
// The method that is used to obtain data is the same as that in TableTunnel.
TunnelRecordReader reader = session.openRecordReader(0, count);
Record record;
while((record = reader.read()) != null)
{
for(int col = 0; col < session.getSchema().getColumns().size(); ++col)
{
// Display the data of the wc_in table. All fields in the wc_in table are strings. You can export the data of the wc_in table to a local file.
System.out.println(record.get(col));
}
}
reader.close();
Use DataWorks to synchronize and export data
DataWorks allows you to execute SQL statements and configure data synchronization tasks to generate and export data.
Log on to the DataWorks console.
In the left-side navigation pane, click Workspaces.
On the Workspaces page, find the desired workspace and choose Shortcuts > Data Development in the Actions column.
Create a workflow.
On the DataStudio page, click Scheduled Workflow in the left-side navigation pane. On the Scheduled Workflow page, choose + Create > Create Workflow.
In the Create Workflow dialog box, enter a name in the Workflow Name field.
Click Create.
Create an ODPS SQL node.
Right-click the workflow that you create, and choose
.In the Create Node dialog box, enter runsql in the Name field and click Confirm.
Configure the ODPS SQL node and click the Save icon.
Create a data synchronization node.
Right-click the workflow that you create, and choose
.In the Create Node dialog box, enter sync2mysql in the Name field and click Confirm.
Specify a data source and a data destination.
Configure the mappings between fields in the data source and data destination.
Configure the parameters in the Channel section.
In the upper-right corner of the Parameters section, click Save.
Configure a dependency between the data synchronization node and the ODPS SQL node. Configure the ODPS SQL node as an output node and the data synchronization node as an export node.
Configure workflow scheduling or use the default settings. Then, click the Run icon. The following information shows an example of an operational log for data synchronization:
2016-12-17 23:43:46.394 [job-15598025] INFO JobContainer - Task start time: 2016-12-17 23:43:34 Task end time: 2016-12-17 23:43:46 Total execution time: 11s Average amount of data per task: 31.36 KB/s Write speed: 1,668 rec/s Read records: 16,689 Number of write and read failures: 0
Execute the following SQL statement to query the data synchronization results:
select count(*) from result_in_db;