This topic provides an example on how to use Java user-defined table-valued functions (UDTFs) to read resources from MaxCompute base on MaxCompute Studio.
Prerequisites
MaxCompute Studio is installed and connected to a MaxCompute project, and a MaxCompute Java Module is created. For more information, see Install MaxCompute Studio, Manage project connections, and Create a MaxCompute Java module.
The development tool IDEA 2024 and JDK version 1.8 are installed.
UDTF code examples
The following sample code is the Java UDTF.
Parameter category | Parameter type | Description |
Input Parameter | String | First input parameter. |
String | Second input parameter. | |
Output Parameter | String | First input parameter value. |
Bigint | Length of the second input parameter string. | |
String | Concatenated value of the line count from file_resource.txt, the row count from the table_resource1 table, and the row count from the table_resource2 table. |
package com.aliyun.odps.examples.udf;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.Iterator;
import com.aliyun.odps.udf.ExecutionContext;
import com.aliyun.odps.udf.UDFException;
import com.aliyun.odps.udf.UDTF;
import com.aliyun.odps.udf.annotation.Resolve;
/**
* project: example_project
* table: wc_in2
* partitions: p1=2,p2=1
* columns: cola,colc
*/
@Resolve("string,string->string,bigint,string")
public class UDTFResource extends UDTF {
ExecutionContext ctx;
long fileResourceLineCount;
long tableResource1RecordCount;
long tableResource2RecordCount;
@Override
public void setup(ExecutionContext ctx) throws UDFException {
this.ctx = ctx;
try {
InputStream in = ctx.readResourceFileAsStream("file_resource.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line;
fileResourceLineCount = 0;
while ((line = br.readLine()) != null) {
fileResourceLineCount++;
}
br.close();
Iterator<Object[]> iterator = ctx.readResourceTable("table_resource1").iterator();
tableResource1RecordCount = 0;
while (iterator.hasNext()) {
tableResource1RecordCount++;
iterator.next();
}
iterator = ctx.readResourceTable("table_resource2").iterator();
tableResource2RecordCount = 0;
while (iterator.hasNext()) {
tableResource2RecordCount++;
iterator.next();
}
} catch (IOException e) {
throw new UDFException(e);
}
}
@Override
public void process(Object[] args) throws UDFException {
String a = (String) args[0];
long b = args[1] == null ? 0 : ((String) args[1]).length();
forward(a, b, "fileResourceLineCount=" + fileResourceLineCount + "|tableResource1RecordCount="
+ tableResource1RecordCount + "|tableResource2RecordCount=" + tableResource2RecordCount);
}
}
The following code shows the dependency that is required in the pom.xml
file for local testing.
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>odps-udf-local</artifactId>
<version>0.48.0-public</version>
</dependency>
Procedure
Local testing
Create a new Java program of the UDTF type in MaxCompute Studio. For example, name the Java Class
UDTFResource
and use the program code from the UDTF code examples.Configure the runtime parameters based on the warehouse resource in the Java Module.
NoteThe input parameters are the values of the first and third columns of each row in the partition p1=2, p2=1 of the wc_in2 table in the local resource.
The code execution retrieves data from the local resource file_resource.txt, the corresponding table wc_in1 under table_resource1, and the corresponding table wc_in2 (p1=2, p2=1) under table_resource2.
Right-click the UDTFResource class and select Run to execute the program. The results are displayed.
Client testing
Click Project Explorer in the upper-left corner of IDEA, and select Add Resource.
Add the file_resource.txt file based on the MaxCompute instance information.
Add the table_resource1 and table_resource2 resources. Then, set the type to the table. Map these resources to the wc_in1 and wc_in2 tables created in MaxCompute and insert data as necessary.
Package the created UDTF into a JAR file, upload it to the MaxCompute project, and register the function. For example, the function name is
my_udtf
. Right-click the UDTFResource class and select Deploy to Server... to enter the packaging and upload interface.Click Project Explorer in the upper-left corner of IDEA, right-click the target MaxCompute project, and select Open Console to start the MaxCompute client and execute SQL commands to call the newly created UDTF. The results are displayed.
Sample SQL command:
SELECT my_udtf("10","20") AS (a, b, fileResourceLineCount);
References
For more information about MaxCompute resources, see Resource.