User-defined functions

This topic describes user-defined functions (UDFs).

How it works

UDFs are persisted into the meta center after they are created. You can load UDFs to compute nodes for execution. SQL-related execution logics are sent to the SQL engine for execution, and then execution results are returned. Other logics such as those related to control flows are executed in the PL engine.

UDFs are registered with the runtime function management center before they are executed. The memory size for a single query is strictly limited during the execution.

Function pushdown

PolarDB-X determines whether a UDF needs to be registered on the data node by checking the SQL DATA ACCESS field. Only if the SQL DATA ACCESS field is set to no sql, the UDF is also registered on the data node. After you register a UDF on the data node, the UDF can be pushed down.

To maintain compatibility with MySQL, PolarDB-X registers UDFs with the MySQL library.

Function pushdown and scaling

After the scaling, you can execute the pushdown udf statement to register UDFs that can be pushed down with the new DN.

Differences from MySQL

Only data query language (DQL) operations can be performed on UDFs, but not data manipulation language (DML) or data definition language (DDL) operations.
In MySQL, functions are stored at the database level. In PolarDB-X, UDFs are stored at the instance level.
The SQL DATA ACCESS field cannot be modified because the pushdown logic of a UDF is involved.

Precautions

UDFs are supported only on instances of PolarDB-X 5.4.16 and later.

Syntax

Create a UDF

CREATE
    [DEFINER = user]
    FUNCTION sp_name ([func_parameter[,...]])
    RETURNS type
    [characteristic ...] routine_body

func_parameter:
    param_name type
 
characteristic:
    COMMENT 'string'
  | LANGUAGE SQL
  | [NOT] DETERMINISTIC
  | { CONTAINS SQL | NO SQL | READS SQL DATA | MODIFIES SQL DATA }
  | SQL SECURITY { DEFINER | INVOKER }
 
routine_body:
　　Valid SQL routine statement

Example

CREATE FUNCTION my_mul(x int, y int) 
RETURNS int
LANGUAGE SQL
DETERMINISTIC
COMMENT 'my multiply function'
RETURN x*y*31;

Call a UDF

UDFs are called in the same way as common built-in functions.

Delete a UDF

DROP FUNCTION [IF EXISTS] FUNCTION_NAME;

Modify a UDF

ALTER FUNCTION func_name [characteristic ...]

characteristic: {
    COMMENT 'string'
  | LANGUAGE SQL
  | SQL SECURITY { DEFINER | INVOKER }
}

View all UDFs

SELECT * FROM information_schema.Routines WHERE ROUTINE_TYPE = 'FUNCTION';

View a specified UDF

SHOW FUNCTION STATUS [LIKE 'pattern' | WHERE expr]

SHOW CREATE FUNCTION Function name;

SELECT * FROM information_schema.Routines WHERE ROUTINE_NAME = 'Function name';

View a UDF that is pushed down

SELECT * FROM information_schema.pushed_function;

Cancel a UDF that is being executed

You can execute the KILL statement to terminate the query that is being executed.

kill {query | connection} connection_id;

Cache management for UDFs

All metadata of a custom function (whether a custom function exists), is always stored in the cache. However, the function body is loaded only when it is required.

View cached UDFs

select * from information_schema.function_cache;

View the cache size

select * from information_schema.function_cache_capacity;

Set the cache size

resize function cache num;

Clear the cache

clear function cache;

Reload a UDF

reload functions;

Examples

# Create a UDF.
CREATE FUNCTION my_mul(x int, y int)
     RETURNS int
     LANGUAGE SQL
     DETERMINISTIC
     COMMENT 'my multiply function'
     RETURN x*y*31;

# The UDF is not called at this time. However, the function already exists in the cache.
# The cache size is 0, indicating that the UDF is not loaded.
select * from information_schema.function_cache;
+--------------------+--------------+------+
| ID                 | FUNCTION     | SIZE |
+--------------------+--------------+------+
| xx.xx.xx.xx:3000   | mysql.my_mul |    0 |
| yy.yy.yy.yy:3100   | mysql.my_mul |    0 |
+--------------------+--------------+------+

# Call the UDF.
select my_mul(2,2);
+--------------+
| my_mul(2, 2) |
+--------------+
|          124 |
+--------------+

# Check the relevant view, which indicates that the UDF has been loaded.
select * from information_schema.function_cache;
+--------------------+--------------+------+
| ID                 | FUNCTION     | SIZE |
+--------------------+--------------+------+
| xx.xx.xx.xx:3000   | mysql.my_mul |    0 |
| yy.yy.yy.yy:3100   | mysql.my_mul |   79 |
+--------------------+--------------+------+

select * from information_schema.function_cache_capacity;
+--------------------+-----------+-------------+
| ID                 | USED_SIZE | TOTAL_SIZE  |
+--------------------+-----------+-------------+
| xx.xx.xx.xx:3000   |         0 | 15139759718 |
| yy.yy.yy.yy:3100   |        79 | 15139759718 |
+--------------------+-----------+-------------+

# After you reload the UDF, the cache is reset.
reload functions;

# Check the relevant view, which indicates that cache has been reset.
select * from information_schema.function_cache;
+--------------------+--------------+------+
| ID                 | FUNCTION     | SIZE |
+--------------------+--------------+------+
| xx.xx.xx.xx:3000   | mysql.my_mul |    0 |
| yy.yy.yy.yy:3100   | mysql.my_mul |    0 |
+--------------------+--------------+------+

Resource management for UDFs

Manage memory

During the execution of a UDF, memory is mainly used to cache cursors. Therefore, PolarDB-X allows you to use the PL_CURSOR_MEMORY_LIMIT and PL_MEMORY_LIMIT parameters to limit the maximum memory that can be used by a single cursor and the maximum memory that can be used by a UDF. The memory for the entire query statement to calls a UDF is also limited.

Note

We recommend that you set the PL_CURSOR_MEMORY_LIMIT parameter to a value not less than 128k (131072), and the PL_CURSOR_MEMORY_LIMIT parameter not greater than the PL_MEMORY_LIMIT parameter.

The PL_CURSOR_MEMORY_LIMIT parameter is used to limit the memory occupied by each cursor. If the memory limit is exceeded, data is stored in the hard disk. The PL_MEMORY_LIMIT parameter is used to limit the memory for a UDF.

Limit call depth

You can use the MAX_PL_DEPTH parameter to limit the call depth. A very deep call makes it difficult to understand the execution logic of a UDF and consumes huge resources.