FAQ about MaxCompute Java UDFs - MaxCompute - Alibaba Cloud Documentation Center

This topic provides answers to some frequently asked questions about MaxCompute user-defined functions (UDFs) that are written in Java.

Class- or dependency-related issues

When you call a MaxCompute UDF, the following issues related to classes or dependencies may occur:

Issue 1: The error message ClassNotFoundException or Some dependencies are missing appears.
- Causes:
  - Cause 1: The JAR file that is specified when you create the UDF is invalid.
  - Cause 2: One or more JAR files on which the UDF depends are not uploaded to MaxCompute. For example, the required third-party package is not uploaded.
  - Cause 3: The UDF is not called in the MaxCompute project in which the UDF is created. For example, a MaxCompute UDF is created in a development project but is called in a production project.
  - Cause 4: The required file does not exist or the resource type is invalid. For example, the type of the uploaded file is PY, but the file type specified in get_cache_file of the UDF code is FILE.
- Solutions:
  - Solution to Cause 1: Check the content of the JAR file and confirm that the JAR file contains all the required classes. Then, repackage resources into a JAR file, and upload the file to your MaxCompute project. For more information about how to package resources into files and upload the files to your MaxCompute project, see Package a Java program, upload the package, and create a MaxCompute UDF.
  - Solution to Cause 2: Upload the required third-party package as a resource to the MaxCompute project. Then, add this package to the resource list when you create the UDF. For more information about how to upload resources and create UDFs, see Add resources and Create a UDF.
  - Solution to Cause 3: On the MaxCompute client, run the list functions; command for the project in which the MaxCompute UDF is called. Then, confirm that the MaxCompute UDF is displayed in the command output and the classes and required resources of the MaxCompute UDF are valid.
  - Solution to Cause 4: On the MaxCompute client, run the desc function <function_name>; command. Then, confirm that all the required files are displayed in the resource list in the command output. If the type of the uploaded file is inconsistent with the file type specified in get_cache_file, you can run the add <file_type> <file_name>; command to add the required file.
Issue 2: The error message NoClassDefFoundError or NoSuchMethodError appears, or the error code ODPS-0123055 appears.
- Causes:
  - Cause 1: The version of the third-party library that is included in the uploaded JAR package is different from the version of the built-in third-party library in MaxCompute.
  - Cause 2: The use of Java sandboxes is limited. The detailed information java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "createClassLoader") appears in Stderr of the job instance. This issue is caused by the limits of Java sandboxes. MaxCompute UDFs that run in distributed mode are limited by Java sandboxes. For more information about the limits of Java sandboxes, see Java sandbox.
- Solutions:
  - Solution to Cause 1: Use maven-shade-plugin to resolve the version inconsistency and modify the import path. Then, package the JAR file again and upload the package to the MaxCompute project. For more information about how to package a JAR file and upload the package, see Package a Java program, upload the package, and create a MaxCompute UDF.
  - Solution to Cause 2: For more information, see Issues related to Java sandbox limits.

Issues related to Java sandbox limits

Issue 1: An error occurs when a MaxCompute UDF is called to access local files, access the Internet, access a distributed file system, or create a Java thread.
Cause: By default, MaxCompute does not allow you to access resources in a network by using UDFs.
Solution: Fill out and submit the network connection application form based on your business requirements. The MaxCompute technical support team then contacts you to establish a network connection. For more information about how to fill out the application form, see Network connection process.

Performance-related issues

When you call a MaxCompute UDF, the following performance issues may occur:

Issue 1: The error message kInstanceMonitorTimeout appears.

Cause: Data processing of the MaxCompute UDF times out. By default, the duration in which the UDF processes data is limited. In most cases, a UDF must process 1024 rows of data at a time within 1800s. This duration is not the total duration in which a worker runs but the duration in which the UDF processes a small batch of data records at a time. In most cases, MaxCompute SQL can process more than 10,000 rows of data per second. This limit aims only to prevent infinite loops in a MaxCompute UDF. If an infinite loop occurs, CPU resources are occupied for a long period of time.

Solution:

If MaxCompute needs to process a large amount of data, you can call ExecutionContext.claimAlive in the Java class method of the MaxCompute UDF to reset the timer.

Optimize the logic of the MaxCompute UDF code. After the optimization, you can configure the following parameters at the session level to adjust the operation of the MaxCompute UDF before you call the MaxCompute UDF. This way, the data processing is accelerated.

Parameter	Description
`set odps.function.timeout=xxx;`	The timeout period for running a UDF. Default value: 1800. Unit: second. You can increase the value of this parameter based on your business requirements. Valid values: 1 to 3600.
`set odps.stage.mapper.split.size=xxx;`	The input data amount of a Map worker. Default value: 256. Unit: MB. You can decrease the value of this parameter based on your business requirements.
`set odps.sql.executionengine.batch.rowcount=xxx;`	The number of rows that MaxCompute can process at a time. Default value: 1024. You can decrease the value of this parameter based on your business requirements.

Issue 2: The error message errMsg:SigKill(OOM) or OutOfMemoryError appears.

Cause: MaxCompute runs jobs in three stages: Map, Reduce, and Join. If MaxCompute processes a large amount of data, the data processing of each instance at each stage is time-consuming.

Solution:

If the error is reported for the fuxi or runtime code, you can configure the following resource parameters to accelerate the data processing.

Parameter	Description
`set odps.stage.mapper.mem=xxx;`	The memory size of a Map worker. Default value: 1024. Unit: MB. You can increase the value of this parameter based on your business requirements.
`set odps.stage.reducer.mem=xxx;`	The memory size of a Reduce worker. Default value: 1024. Unit: MB. You can increase the value of this parameter based on your business requirements.
`set odps.stage.joiner.mem=xxx;`	The memory size of a Join worker. Default value: 1024. Unit: MB. You can increase the value of this parameter based on your business requirements.
`set odps.stage.mapper.split.size=xxx;`	The input data amount of a Map worker. Default value: 256. Unit: MB. You can increase the value of this parameter based on your business requirements.
`set odps.stage.reducer.num=xxx;`	The number of Reduce workers. You can increase the value of this parameter based on your business requirements.
`set odps.stage.joiner.num=xxx;`	The number of Join workers. You can increase the value of this parameter based on your business requirements.

If this error is reported for Java code, you can adjust the preceding parameters and run the set odps.sql.udf.jvm.memory=xxx; command to increase the Java virtual machine (JVM) memory size.

For more information about the parameters, see SET operations.

UDTF-related issues

When you call a Java UDTF, the following issues may occur:

Issue 1: The error message Semantic analysis exception - only a single expression in the SELECT clause is supported with UDTF's appears when you call a UDTF.
- Cause: Columns or expressions are specified in the SELECT statement in which the UDTF is called. The following sample code shows an incorrect SQL statement:
```
select b.*, 'x', udtffunction_name(v) from table lateral view udtffunction_name(v) b as f1, f2;
```
- Solution: You can use the Java UDTF with Lateral View in the SELECT statement. Sample statement:
```
select b.*, 'x' from table lateral view udtffunction_name(v) b as f1, f2;
```
Issue 2: The error message Semantic analysis exception - expect 2 aliases but have 0 appears.
- Cause: The aliases of the output columns are not specified in the SELECT statement in which the UDTF is called.
- Solution: You can use the AS clause to specify the aliases of output columns in the SELECT statement in which the Java UDTF is called. Sample statement:
```
select udtffunction_name(paramname) as (col1, col2);
```