Data types

Updated at: 2025-03-28 07:27

Selecting a proper data type for table fields improves database performance and storage efficiency, helps maintain database data integrity and readability, and enhances database security and consistency. This topic describes the data types supported by ApsaraDB for SelectDB. You can execute the SHOW DATA TYPES; statement to query the data types that are supported by SelectDB.

Numeric data types

Data type

Number of bytes

Description

Data type

Number of bytes

Description

BOOLEAN

1

A Boolean value. Valid values:

0: false.

1: true.

TINYINT

1

A signed integer.

Valid values: -128 to 127.

SMALLINT

2

A signed integer.

Valid values: -32768 to 32767.

INT

4

A signed integer.

Valid values: -2147483648 to 2147483647.

BIGINT

8

A signed integer.

Valid values: -9223372036854775808 to 9223372036854775807.

LARGEINT

16

A signed integer.

Valid values: -2^127 + 1 to 2^127 - 1.

FLOAT

4

A floating-point number.

Valid values: -3.4 × 10^38 to 3.4 × 10^38.

DOUBLE

8

A floating-point number.

Valid values: -1.79 × 10^308 to 1.79 × 10^308.

DECIMAL

4/8/16

A high-precision fixed-point number.

Syntax: DECIMAL(M[,D]).

  • M:

    • Indicates the precision, which represents the total number of valid digits.

    • Valid values: 1 to 38.

  • D:

    • Indicates the scale, which represents the total number of digits to the right of the decimal point.

    • Valid values: 0 to the precision.

      • If the precision is greater than or equal to 0 and smaller than or equal to 9, 4 bytes are occupied.

      • If the precision is greater than 9 and smaller than or equal to 18, 8 bytes are occupied.

      • If the precision is greater than 16 and smaller than or equal to 38, 16 bytes are occupied.

Date and time data types

Data type

Number of bytes

Description

Data type

Number of bytes

Description

DATE

16

A date.

Valid values: '0000-01-01' to '9999-12-31'.

Default output format: yyyy-MM-dd.

DATETIME

16

A date and time.

Syntax: DATETIME([P]).

Note

P:

  • Indicates the precision.

  • Default value: 0.

  • Valid values: 0 to 6. Up to six decimal places are supported, which indicates that the value can be accurate to microseconds.

Valid values: '0000-01-01 00:00:00[.000000]' to '9999-12-31 23:59:59[.999999]'.

Default output format: yyyy-MM-dd HH:mm:ss.SSSSSS.

String data types

Data type

Number of bytes

Description

Data type

Number of bytes

Description

CHAR

M

A fixed-length string.

Syntax: CHAR(M).

Note

M indicates the number of bytes that a fixed-length string contains.

Valid values of M: 1 to 255.

VARCHAR

Indefinite

A variable-length string.

Syntax: VARCHAR(M).

Note
  • M indicates the number of bytes that a variable-length string contains.

  • Variable-length strings are stored in the UTF-8 format. Each English character occupies 1 byte and each Chinese character occupies 3 bytes.

Valid values of M: 1 to 65533.

STRING

Indefinite

A variable-length string.

Default and minimum length: 1,048,576 bytes, which is equal to 1 MB.

Maximum length: 2,147,483,643 bytes, which is equal to 2 GB.

Important
  • You can modify the jsonb_type_length_soft_limit_bytes parameter to adjust the length.

  • The STRING data type applies only to the value columns and does not apply to key, partition, or bucket columns.

Semi-structured data types

Data type

Number of bytes

Description

Data type

Number of bytes

Description

ARRAY

Indefinite

An array that consists of elements of Type T.

Syntax: ARRAY<T>.

Note
  • T indicates the basic data type.

  • Valid values of T:

    BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE,
    DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING

Usage notes:

  • A MAP column cannot be used as a key column.

  • MAP columns can be used only in tables in the Duplicate or Unique key model.

MAP

Indefinite

A map that consists of key-value pairs.

Syntax: MAP<K,V>.

Note

The following types are supported for the key-value pairs:

BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE,
DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING

Usage notes:

  • A MAP column cannot be used as a key column.

  • MAP columns can be used only in tables in the Duplicate or Unique key model.

STRUCT

Indefinite

A structure composed of multiple fields. The structure is a collection of multiple columns.

Syntax: STRUCT<field_name:field_type, ... >

Note
  • field_name: the identifier of the field, which is unique.

  • field_type: the type of the field.

Usage notes:

  • A MAP column cannot be used as a key column.

  • MAP columns can be used only in tables in the Duplicate key model.

JSON

Indefinite

The JSON data type stores data in a binary format. Internal fields are accessed by using JSON functions.

Default length: 1,048,576 bytes, which is equal to 1 MB.

Maximum length: 2,147,483,643 bytes, which is equal to 2 GB.

Note

You can modify the jsonb_type_length_soft_limit_bytes parameter to adjust the length.

VARIANT

Indefinite

The dynamic variable data type, which is developed for semi-structured data such as JSON data. This type of data can be stored in any JSON data field. The system automatically splits a JSON data field into sub-columns for storage. This improves the storage efficiency and query and analysis performance. The VARIANT data type is suitable for complex nested data structures, which can change at any time.

Usage notes:

  • The VARIANT data type applies only to the value columns and does not apply to key, partition, or bucket columns.

Important

Make sure that your instance runs SelectDB Core 4.0.0 or later. If the instance runs a kernel version earlier than SelectDB Core 4.0.0, upgrade the kernel version. For more information, see Update the kernel version of an instance.

Aggregation data types

Data type

Number of bytes

Description

Data type

Number of bytes

Description

HLL

Indefinite

An approximate count of distinct elements. The performance of HyperLogLog (HLL) is higher than Count Distinct if the amount of data is large.

Usage notes:

  • In most cases, the error rate of HLL is about 1%. Occasionally, the error rate of HLL reaches up to 2%.

  • When you create a table that contains an HLL column, you cannot set the HLL column as a key column and must set the aggregation type of the HLL column to HLL_UNION.

  • You do not need to specify the length and default value. The length automatically varies based on the degree of data aggregation.

  • HLL columns can be queried or used only by using the hll_union_agg, hll_raw_agg, hll_cardinality, and hll_hash functions.

BITMAP

Indefinite

An exact count of distinct elements, which is applicable to scenarios such as unique visitor (UV) count or audience selection.

Usage notes:

  • BITMAP columns can be used in tables in the Aggregate, Unique, or Duplicate key model and cannot be set as key columns. When you create a table that contains a BITMAP column in the Aggregate key model, you must set the aggregation type of the BITMAP column to BITMAP_UNION.

  • You do not need to specify the length and default value. The length is specified within the system based on the degree of data aggregation.

  • BITMAP columns can be queried or used only by using functions such as bitmap_union_count, bitmap_union, bitmap_hash, and bitmap_hash64.

QUANTILE_STATE

Indefinite

The data type that can be used to calculate the approximate value of quantiles.

Different values with the same key are pre-aggregated when data is imported.

  • If the number of aggregated values does not exceed 2,048, all data is recorded in detail.

  • If the number of aggregated values is greater than 2,048, the TDigest algorithm is used for data aggregation or clustering and to save the centroid points after the data aggregation or clustering.

Usage notes:

  • When you create a table that contains a QUANTILE_STATE column, you cannot set the QUANTILE_STATE column as a key column and must set the aggregation type of the QUANTILE_STATE column to QUANTILE_UNION.

  • You do not need to specify the length and default value. The length is specified within the system based on the degree of data aggregation.

  • QUANTILE_STATE columns can be queried or used only by using functions such as QUANTILE_PERCENT, QUANTILE_UNION, and TO_QUANTILE_STATE.

AGG_STATE

Indefinite

The aggregate function.

It is used to accelerate aggregations and can be used only with the state, merge, and union combinators.

Usage notes:

  • An AGG_STATE column cannot be used as a key column. When you create a table that contains an AGG_STATE column, you must declare the signature of the aggregate function.

  • You do not need to specify a length or a default value. The actual size of the stored data depends on the function implementation.

IP address data types

Important

The following data types are supported only for SelectDB Core 4.0.0 or later. If your instance runs a kernel version earlier than SelectDB Core 4.0.0, upgrade the kernel version. For more information, see Update the kernel version of an instance.

Data type

Number of bytes

Description

Data type

Number of bytes

Description

IPv4

4 bytes

Stores an IPv4 address in a 4-byte binary string. The IPv4 address data type is used with ipv4_* functions.

IPv6

16 bytes

Stores an IPv6 address in a 16-byte binary string. The IPv6 address data type is used with ipv6_* functions.

  • On this page (1)
  • Numeric data types
  • Date and time data types
  • String data types
  • Semi-structured data types
  • Aggregation data types
  • IP address data types
Feedback