Overview - MaxCompute - Alibaba Cloud Documentation Center

MaxCompute allows you to write user-defined aggregate functions (UDAFs) in Java or Python to extend the capabilities of MaxCompute functions and accommodate your business requirements. This topic describes the types, limits, usage notes, and development process of UDAFs. This topic also describes how to use UDAFs.

Background information

A many-to-one mapping is established between the input and output data of a UDAF. Multiple input records are aggregated to generate one output value. MaxCompute allows you to write UDAFs in Java or Python. The following table describes the two types of UDAFs.


UDAF type	Description
Java UDAF	This type of UDAF is written in Java to implement the function logic. For more information, see Java UDAFs.
Python UDAF	This type of UDAF is written in Python to implement the function logic. Python UDAFs are classified into Python 2 UDAFs and Python 3 UDAFs. Python 2 UDAFs: Python 2.7 is used. For more information, see Python 2 UDAF. Python 3 UDAFs: CPython 3.7.3 is used. For more information, see Python 3 UDAF.

Limits

You cannot access the Internet by using UDFs. If you want to access the Internet by using UDFs, fill in the network connection application from based on your business requirements and submit the application. After the application is approved, the MaxCompute technical support team will contact you and help you establish network connections. For more information about how to fill in the network connection application form, see Network connection process.

Usage notes

Before you use UDFs, take note of the following items:

UDFs cannot compete with built-in functions in performance. We recommend that you preferentially use built-in functions to implement your business logic.
If you use a UDF in SQL statements, the memory usage of a computing job may exceed the default allocated memory size if a large amount of data is computed and data skew occurs. In this case, you can run the set odps.sql.udf.joiner.jvm.memory=xxxx; command at the session level to resolve the issue. For more information about the MaxCompute UDF FAQ, see FAQ about MaxCompute UDFs.
If the name of a UDF is the same as that of a built-in function, the UDF is preferentially called. For example, if UDF CONCAT and built-in function CONCAT both exist in MaxCompute, the system automatically calls UDF CONCAT instead of the built-in function CONCAT. If you want to call the built-in function, you must add the symbol :: before the built-in function, for example, select ::concat('ab', 'c');.

Development process

This section describes the development process of a UDAF.

The following figure demonstrates how to write a MaxCompute UDF in Java. Write a UDF in Java


No.	Required	Description	Platform	References
1	No	Before you can use Maven to write code, you must add the related SDK dependencies to the POM file. This ensures that the code can be compiled. The following SDK dependency shows an example: `<dependency> <groupId>com.aliyun.odps</groupId> <artifactId>odps-sdk-udf</artifactId> <version>0.29.10-public</version> </dependency>` You can search for `odps-sdk-udf` from Maven repositories to obtain the version of the SDK dependency.	IntelliJ IDEA (Maven)	None
2	Yes	Write a UDF based on your business requirements.	IntelliJ IDEA (Maven) and MaxCompute Studio	Develop a UDF in Java
3	Yes	Debug the UDF by running it on your on-premises machine or by performing unit testing to check whether the result meets expectations.
4	Yes	Debug the UDF code to ensure that the code is packaged into a JAR file after it is successfully run on your on-premises machine.
5	Yes	Upload the JAR file as a resource to your MaxCompute project.	MaxCompute client (odpscmd), MaxCompute Studio, and DataWorks	MaxCompute client Add resources Create a UDF MaxCompute Studio Package a Java program, upload the package, and create a MaxCompute UDF DataWorks Create and use MaxCompute resources Create and use a MaxCompute function
6	Yes	Create a UDF based on the JAR file that you uploaded.
7	No	Call the UDF in the query data code.		None

The following figure demonstrates how to write a MaxCompute UDF in Python. Write a UDF in Python


No.	Required	Description	Platform	References
1	Yes	Write a UDF based on your business requirements.	MaxCompute Studio	Develop a Python UDF
2	Yes	Debug the UDF by running it on your on-premises machine or by performing unit testing to check whether the result meets expectations.	MaxCompute Studio	Develop a Python UDF
3	Yes	Upload Python files or required resources, such as file resources, table resources, and third-party packages, to a MaxCompute project.	MaxCompute client (odpscmd), MaxCompute Studio, and DataWorks	MaxCompute client Add resources Create a UDF MaxCompute Studio Upload a Python program and create a MaxCompute UDF DataWorks Use resources to register functions
4	Yes	Create a UDF based on the uploaded Python files or required resources.
5	No	Call the UDF in the query data code.		None

Instructions

Use the following methods to call UDFs:

Use a UDF in a MaxCompute project: The method is similar to that of using built-in functions.
Use a UDF across projects: Use a UDF of Project B in Project A. The following statement shows an example: select B:udf_in_other_project(arg0, arg1) as res from table_t;. For more information about resource sharing across projects, see Cross-project resource access based on packages.