Background information
A user can execute the CREATE TYPE
statement to register a new custom data type in the current database. The user who executes the CREATE TYPE statement automatically becomes the owner of the new data type. You can execute the CREATE TYPE statement to create a custom data structure with specific attributes and behaviors in the database.
If you specify a schema name when you execute the CREATE TYPE
statement, the new data type created by the statement is created in the specified schema. Otherwise, the new data type is created in the current schema by default.
Note
The name of the new data type must be unique in the same schema. This means that the name of the new type must be distinct from the name of an existing type or domain in the same schema. The type name must also be distinct from the name of an existing table in the same schema because tables have associated data types. Therefore, to avoid potential naming conflicts, the type name must remain unique in its schema.
The CREATE TYPE
statement has five variants, as shown in the following syntax. The variants create the following data types: composite type
, enumerated type
, range type
, base type
, and shell type
. This topic describes how to create the first four data types. The shell
type is a placeholder that is used to reserve a name for a data type that will be further defined. The placeholder can be created by issuing the CREATE TYPE
statement that contains only the type name without additional parameters. When you create a range type and a base type, the shell type
is used as a forward reference to pre-declare the existence of a type before you define the type.
Syntax
CREATE [ OR REPLACE ] TYPE name AS
( [ attribute_name data_type [ COLLATE collation ] [, ... ] ] );
CREATE TYPE name AS ENUM
( [ 'label' [, ... ] ] );
CREATE TYPE name AS RANGE (
SUBTYPE = subtype
[ , SUBTYPE_OPCLASS = subtype_operator_class ]
[ , COLLATION = collation ]
[ , CANONICAL = canonical_function ]
[ , SUBTYPE_DIFF = subtype_diff_function ]
);
CREATE TYPE name (
INPUT = input_function,
OUTPUT = output_function
[ , RECEIVE = receive_function ]
[ , SEND = send_function ]
[ , TYPMOD_IN = type_modifier_input_function ]
[ , TYPMOD_OUT = type_modifier_output_function ]
[ , ANALYZE = analyze_function ]
[ , INTERNALLENGTH = { internallength | VARIABLE } ]
[ , PASSEDBYVALUE ]
[ , ALIGNMENT = alignment ]
[ , STORAGE = storage ]
[ , LIKE = like_type ]
[ , CATEGORY = category ]
[ , PREFERRED = preferred ]
[ , DEFAULT = default ]
[ , ELEMENT = element ]
[ , DELIMITER = delimiter ]
[ , COLLATABLE = collatable ]
);
CREATE TYPE name;
Composite type
You can use the first variant of the CREATE TYPE
statement to create a composite type
. The composite type consists of a set of attribute names. Each attribute name is associated with a specific data type. If the data type associated with an attribute supports sorting operations, you can specify a sorting rule for each attribute. The composite type is similar to the row type in the table structure. If you execute the CREATE TYPE statement to create a composite type, you can define a structured data type that can be used in various ways without actually creating a full table.
When you create a composite type, you can use the OR REPLACE
clause. When you use the OR REPLACE clause, an existing type is replaced by the newly defined type that has the same name in a database. This way, you can update the definition of an existing composite type without the need to delete the existing type in advance. This simplifies the process of type maintenance.
Note
Proceed with caution when you use the OR REPLACE
clause because the clause overwrites the definition of the original type. The overwrite operations may affect the database objects that depend on the type.
The composite type can be used as the type of the parameters and return values of a function and provides a convenient way to define a complex structure without the need to create a table for the structure. This ensures the flexibility and practicality of the composite type in databases, especially in scenarios where multiple values need to be passed as a whole, or where a function needs to return multiple values rather than just a value of a simple data type.
To successfully create a composite type, you must have the USAGE permission on the data types associated with all attributes of the composite type.
Enumerated type
You can use the second variant of the CREATE TYPE
statement to create an enumerated type
. The enumerated type consists of a series of predefined tags that must be enclosed in quotation marks. The length of each tag cannot exceed the upper limit specified by the NAMEDATALEN
field. Typically, a tag is 64 bytes in length.
You can create an enumerated type that does not contain any tag. However, you cannot use such type to store any value before you execute the ALTER TYPE
statement on such type to add at least one tag to the type. This means that the enumerated type without any value is unavailable for database operations before you add enumerated values to such type. This ensures the flexible scalability of the enumerated type. This way, database designers do not need to immediately determine all possible enumerated values when initially defining the enumerated type. Instead, they can gradually add new enumerated values as application requirements change.
Range type
You can use the third variant of the CREATE TYPE
statement to create a range type
. The range type indicates a continuous interval within which values are defined based on a subtype. A subtype
is the basic data type that constitutes a range. The subtype must correspond to a B-tree operator class to determine the order of values in the range. In most cases, the default B- tree operator class of a subtype is used as the sorting basis. If you want to use a non-default operator class for sorting, you can use the subtype_opclass
option to specify the name of the class. If you can sort a subtype and need to use a non-default sorting rule for the values of the range type, you can use the collation
option to specify the rule.
When you create a range type, you can define the canonical
function. The function is an optional function that accepts the parameters of the range type and returns the canonicalized values of the same type. The function is used to convert a range value into the canonical representation of the value. However, you may encounter difficulties when you create the canonical
function because the function needs to be defined before the range type declares the function. To successfully create the canonical function, you must create a shell type
. The shell type is the placeholder type that includes only names and owners but not specific attributes. You can execute the CREATE TYPE name
statement without specifying additional parameters to create a shell type. After you create a shell type, you can use the shell
type to declare the type of the parameters and return values of the canonical
function. If the actual range type is the same as that declared by the shell type, the system automatically replaces the placeholder type (the shell
type) with the range type that is fully defined.
Alternatively, you can define the subtype_diff
function. The function accepts two subtype values as parameters and returns a value of the double precision
type to indicate the difference between the two values. The subtype_diff
function is optional. However, it can improve the efficiency of GiST
indexes on a column of the range type.
Base type
The fourth variant of the CREATE TYPE
statement is used to create a base type. The base type is also called the scalar type. The creation of the base type requires privileged permissions. Therefore, you must be a superuser of the current database to ensure security. If the base type is defined improperly, the server may perform unexpected operations and may even crash.
When you define a new base type, you can specify the parameters in any order. Most of the parameters are optional. However, before you define the new base type, you must execute the CREATE FUNCTION
statement to register at least two functions. The input_function
and output_function
functions are required when you create a new base type. The following functions are optional: receive_function
, send_function
, type_modifier_input_function
, type_modifier_output_function
, and analyze_function
.
In most cases, the preceding functions need to be written in the C programming language or in other low-level languages because these functions must closely work with the underlying database systems to process operations such as the input, output, receiving, sending, and analysis of data types. The writing and registration of these functions are the most technical part of the process of creating a new base type, which also ensures that the new base type properly work in database systems.
The input_function
function converts the external text representation of a user-defined type into the internal representation that applies to all operators and functions defined by the type. The output_function
function executes the reverse conversion process in which the internal representation is converted back into the readable text representation. The input_function function accepts a parameter whose type is cstring
or three parameters whose types are respectively cstring
, oid
(object identifier), and integer
. The parameter of the cstring type is the C string representation of the input text. The parameter of the oid type is the Object Identifier (OID) of the base type or the type of elements in an array. The parameter of the integer type is the type modifier (typmod
) of a column. If the type modifier is unknown, the value -1
is passed. The input_function function must return values of the new base type. In most cases, the input_function function is declared as STRICT
. If NULL
is input, the function is not called. If the input_function function is not declared as STRICT
, the first parameter of the function is NULL
when NULL
is input. In this case, the input_function function returns NULL
unless errors occur. This design supports domain input functions that may not accept NULL
as an input value.
The output_function function needs to accept the parameters of a new type and convert the parameters into return values of the cstring
type. If the output_function function returns the value NULL
, the function is not called.
The receive_function
function processes the external binary representation of the base type and converts the external representation into the internal representation. If the function is not provided, the binary representation of the base type cannot be input. The binary representation provides higher efficiency and can be easily migrated among different systems. For example, the standard integer type uses network byte order as its external binary representation, while the internal representation uses the local byte order of machines. The receive_function function needs to verify whether the input values are valid and can accept a parameter whose type is internal
or three parameters whose types are respectively internal
, oid
, and integer
. The parameter of the internal
type specifies a StringInfo
buffer that contains received byte strings. The parameter of the oid type and the parameter of the integer type are consistent with the input_function function. In most cases, the receive_function function is also declared as STRICT
. Otherwise, when the internal function receives the input value NULL
, the parameter of the internal type is NULL
, and the function returns NULL
unless errors occur.
The send_function
function converts the internal representation into the external binary representation. If the send_function function is not provided, the binary representation of the base type cannot be output. The send_function function must accept the parameters of a new type and return values of the bytea
type. If the value NULL
is returned, the send_function function is not called. The send_function function provides a pathway for user-defined types to interact with internal database systems, which ensures that the types can suit the storage and communication mechanisms of databases.
You must declare the input_function and output_function functions before you create a new type because the two functions reference the new types that have not been created as return values or parameters. To resolve this issue, you must define the shell type
. To define the shell type, you can execute the CREATE TYPE name
statement that does not contain additional parameters. Then, you can reference the shell type
by calling the input_function and output_function functions that are written in the C programming language. Finally, you can execute the CREATE TYPE
statement that contains the full definition of a data type to replace the shell type
with a full and legal type definition. After the replacement, the new type becomes available in database operations.
If the base type supports type modifiers that act as optional constraints attached to type declarations, such as char(5)
or numeric(30,2)
, you must define the type_modifier_input_function
and type_modifier_output_function
functions. In PolarDB, user-defined types can accept one or more simple constants or identifiers as modifiers. To store the information in a system directory, modifiers must be able to be encapsulated into a non-negative integer value. Type modifiers are accepted by the type_modifier_input_function
function. This function receives declared modifiers as an array of the cstring
type and must perform validity checks. If a modifier is invalid, the type_modifier_input_function function throws an error. If the modifier is valid, the function returns a non-negative integer that is stored in the typmod
column. If the base type has no constraints specified by the type_modifier_input_function
function, any type modifiers are denied.
In this case, the type_modifier_output_function
function converts the integer value typmod
that is stored in the system directory back into a readable representation. The type_modifier_output_function function must return a string of the cstring
type that is attached to the end of a type name. For example, the type_modifier_output_function might return the value (30,2)
for the numeric
type. If the default representation is to enclose the stored integer value typmod
with a pair of parentheses, you can omit the type_modifier_output_function
function. The type_modifier_output_function function provides a pathway for user-defined types to parse and display type modifiers. This way, you can specify additional custom constraints in type declarations.
The analyze_function
function collects statistics on the base type. By default, if the base type has a default B-tree operator class, ANALYZE
attempts to collect statistics by using the equals
and less-than
operators of the base type. However, this method is not appropriate for non-scalar types. Therefore, you can replace the default behavior by specifying a custom analytic function. A custom analytic function must declare a single parameter of the internal
type and return a value of the boolean
type.
Only the I/O
function and other functions created for a new type receive the details of the internal representation of the type. However, some attributes of the internal representation must be declared to PolarDB. The most important attribute is internallength
. The length of the base type can be fixed or variable. If the length of the type is fixed, the internallength
attribute value is a positive integer. If the length of the type is variable, the internallength
attribute is set to VARIABLE
, and typlen
is set to -1
in the internal variable-length base type. The internal representation of the variable-length base type must begin with a 4-byte integer that represents the total length of the type.
PASSEDBYVALUE
indicates that a new type should be passed by values rather than by references. If you want to use PASSEDBYVALUE
, the length of the base type must be fixed and the bytes of the internal representation of the type cannot exceed the upper limit of the Datum
type. The upper limit may be 4 bytes on some systems and may be 8 bytes on other systems.
The alignment
parameter defines the memory alignment requirements of the base type. The memory of the base type can be aligned based on the 1-byte, 2-byte, 4-byte, or 8-byte boundary.
Note
The value of the alignment
parameter must greater than or equal to 4 for all variable-length types because the types need to include a length value of the int4
type as their header element in their internal representation.
The storage
parameter specifies the storage policy for variable-length types. Fixed-length types support only the plain
policy. This means that the data of the plain type is always stored in rows and is not compressed. The extended
policy specifies the system to attempt to compress excessively long data values and delete data from primary table rows if necessary. The external
policy allows you to delete data from rows. However, the system does not attempt to compress the data. The main
policy supports data compression. However, the main policy does not delete data from primary table rows. Data to which the main policy applies is deleted only when no other methods are available to keep the row size within acceptable limits. The main policy prioritizes retaining data in rows over the extended
and external
policies.
All storage
options except for plain
specify that the functions of the base type can process values that have been processed by using The Oversized-Attribute Storage Technique
(TOAST
). These values only determine the default TOAST
storage strategy of a column that uses a TOAST
-compatible data type. You can still execute the ALTER TABLE SET STORAGE
statement to specify other storage policies for the column.
The like_type
parameter provides a method to define the basic attributes of the new base type based on an existing base type. The method allows you to directly copy the attributes of the existing base type. The values of the internallength
, passedbyvalue
, alignment
, and storage
attributes are copied from the existing base type. These values can be overwritten in the LIKE
clause. You do not need to use the LIKE clause. This method is suitable for specifying the attributes of the base type when the underlying implementation of the new base type relies on the existing base type.
The category
and preferred
parameters determine the optimal implicit type conversion in the case of ambiguity. Each data type is assigned to a category named with a single ASCII
character. Each type can become the preferred type in the category to which it belongs. When an overloaded function or operator needs to be disambiguated, the parser prioritizes the conversion into the preferred type. The parser performs type conversion only within the same category. For types that are not implicitly converted into any other type or do not accept implicit conversions from any other type, you can keep the default values for these settings. For a group of related types with implicit conversions, we recommend that you mark the types as belonging to the same category and select one or two of the most commonly used types as the first choice for the same category. The category
parameter is required especially when you add a user-defined type to an existing built-in category such as the number or string category. You can also create a category that consists of entirely user-defined types. You can select any non-uppercase ASCII characters as the identification of such category.
If you want a column of the array type to have a non-null default value, you can specify a default value. You can use the DEFAULT
keyword to specify the default value. The default value can be overwritten by the explicit DEFAULT
clause that is attached to a specified column.
To define a data type as the array type, use the ELEMENT
keyword to indicate the basic element type of an array. For example, to define the array type that is based on a 4-byte integer int4
, specify ELEMENT = int4
.
To specify a delimiter that separates the values of an array in the external representation, you can set delimiter
to a specific character. The default delimiter is a comma (,
).
Note
The delimiter depends on the element type of an array but not the array type.
If the optional parameter collatable
is set to true, the column definitions and expressions of the array type can carry collation information by using the COLLATE
clause. The functions in which the collatable parameter is set to true need to actually use this information. Only marking types as sortable does not mean that the functions automatically use this information.
Array type
If you define a new type, PolarDB automatically creates the array type corresponding to the new type. The array type name is automatically generated by prefixing the name of the original element type with an underscore (_). If the bytes of the array type name exceed the upper limit specified by NAMEDATALEN
, the name is automatically truncated. If the truncated name conflicts with the name of an existing type, PolarDB tries another name until it finds a name that does not cause the conflict.
The length of an array type that is implicitly created is variable. The array type uses built-in input and output functions array_in
and array_out
. The array type varies with the ownership or schema of its element type. If the corresponding element type is deleted, the array type is also deleted. This way, the consistency and simplicity of the type system can be ensured and you are freed from manually creating and managing the array type.
PolarDB automatically creates the array type corresponding to a user-defined type. However, the ELEMENT
option is still required when you are creating a fixed-length type that is essentially an array of multiple identical elements. Assume that you want to provide a holistic operation for the type and allow direct access to individual elements in an array by using subscripts. For example, the point
type contains two floating-point numbers internally, and the two coordinate values can be directly accessed by using the point[0]
and point[1]
subscripts.
Note
The subscript-based access feature is only applicable to fixed-length types whose internal structure is a series of fixed-length fields. However, subscripts must have a general internal expression for variable-length types to support subscript-based access by using the array_in
and array_out
functions. The subscripts of the fixed-length array type start from zero due to historical reasons. In contrast, the subscripts of the variable-length array type start from 1.
Parameters
Parameter | Description |
name
| The name of the type that you want to create. The type name can be qualified by the schema. |
attribute_name
| The name of an attribute (column) of the composite type. |
data_type
| The name of the existing type of the column. |
collation
| The name of the existing collation associated with a column or the range or associated with the composite type. |
label
| The text string that represents the text label of the specific value of the enumerated type. In the definition of the enumerated type, each value is uniquely represented by a label. |
subtype
| The name of the element type in the range type. The range type represents the value range of the element type. |
subtype_operator_class
| The name of the B-tree operator class used by the element type of the range type. |
canonical_function
| The name of the canonicalized function for the range type. |
subtype_diff_function
| The function that calculates the difference between two elements in the range type. |
input_function
| The function that converts data from the external text representation of the type to the internal representation. |
output_function
| The function that converts data from the internal representation of the type to the external text representation. This function corresponds to the input_function function. |
receive_function
| The function that converts data from the external binary representation of the type to the internal representation. |
send_function
| The function that converts data from the internal representation of the type to the external binary representation. |
type_modifier_input_function
| The function that converts the modifier array of the type to the internal representation. |
type_modifier_output_function
| The function that converts the internal modifier representation of the type to the external text representation. |
analyze_function
| The function that performs statistical analysis for the type. |
internallength
| The numeric constant that specifies the length of the internal representation of the new type in bytes. By default, its length is variable. |
alignment
| The storage alignment requirements for the type. Valid values: char , int2 , int4 , and double . Default value: int4 . |
storage
| The storage strategy for the type. Valid values: plain , external , extended , and main . Default value: plain . |
like_type
| The name of the existing type that has the same internal representation as the new type. The values of internallength , passedbyvalue , alignment , and storage are copied from the existing type unless the values are explicitly overwritten elsewhere in the CREATE TYPE statement. |
category
| The category code (a ASCII character) of the type. The default value is U that represents user-defined type . You can use other ASCII characters to create custom categories. |
preferred
| Specifies that the type is preferred in its category. If this parameter is set to true, the type is preferred in its category. Valid values: true and false. Default value: false. Proceed with caution when you add a preferred type to an existing type category. The addition of a preferred type may cause unexpected behavior changes. |
default
| The default value of the type. If you do not specify this parameter, this parameter is empty by default. |
element
| The type of the elements in the array type that you create. |
delimiter
| The delimiter that is used to separate values in an array of the array type. |
collatable
| Specifies whether the collation information is available. If this parameter is set to true, the information is available. Valid values: true and false. Default value: false. |
Usage notes
We recommend that you do not use names that begin with an underscore when naming types and tables. Databases implement mechanisms to modify the name of the automatically generated array type to prevent conflicts. However, potential confusion risks still exist. In most cases, names that begin with an underscore are used as names generated by database systems, such as array type names that consist of base type names prefixed by an underscore (_). Therefore, to reduce unnecessary confusion and keep naming clear and consistent, we recommend that you do not use this naming method.
Examples
Create a composite type and use the type as the type of return values in the subsequent function definition.
CREATE TYPE compfoo AS (f1 int, f2 text);
CREATE FUNCTION getfoo() RETURNS SETOF compfoo AS $$
SELECT fooid, fooname FROM foo
$$ LANGUAGE SQL;
Create a new composite type and update the definition of the new type after creation.
CREATE TYPE compfoo AS (f1 int, f2 text);
CREATE OR REPLACE TYPE compfoo AS (f2 text, f1 int);
Define an enumerated type and use the type in the definition of a new table.
CREATE TYPE bug_status AS ENUM ('new', 'open', 'closed');
CREATE TABLE bug (
id serial,
description text,
status bug_status
);
Create a range type.
CREATE TYPE float8_range AS RANGE (subtype = float8, subtype_diff = float8mi);
Create a base data type named box
and use the type as the data type of a column in the definition of a new table.
CREATE TYPE box;
CREATE FUNCTION my_box_in_function(cstring) RETURNS box AS ... ;
CREATE FUNCTION my_box_out_function(box) RETURNS cstring AS ... ;
CREATE TYPE box (
INTERNALLENGTH = 16,
INPUT = my_box_in_function,
OUTPUT = my_box_out_function
);
CREATE TABLE myboxes (
id integer,
description box
);
If the internal structure of box
is an array consisting of four float4
elements, the following statement can be used to define the type:
CREATE TYPE box (
INTERNALLENGTH = 16,
INPUT = my_box_in_function,
OUTPUT = my_box_out_function,
ELEMENT = float4
);
Create a large object type and use the type in a table definition.
CREATE TYPE bigobj (
INPUT = lo_filein, OUTPUT = lo_fileout,
INTERNALLENGTH = VARIABLE
);
CREATE TABLE big_objs (
id integer,
obj bigobj
);