When you initialize an ApsaraDB RDS for PostgreSQL instance, you can configure the collation of each database based on your business requirements. The collation includes the string sort order, character classification method, numeric value format, date and time format, and currency format. In addition, you may also need to configure the LC_COLLATE and LC_CTYPE environment variables.
LC_COLLATE | String sort order |
LC_CTYPE | Character classification |
LC_MESSAGES | Message language |
LC_MONETARY | Currency format |
LC_NUMERIC | Numeric value format |
LC_TIME | Date and time format |
You can configure these environment variables to specify a collation that meets your business requirements in a locale. This topic provides an example on how to configure the collation of a database in a locale and how to sort the output by a specific order.
Supported character sets
For more information, see Character Set Support.
LC_COLLATE and LC_CTYPE settings supported by a character set
You can execute the following SQL statement to query the LC_COLLATE and LC_CTYPE settings that are supported by a character set from the pg_collation system table:
select pg_encoding_to_char(collencoding) as encoding,collname,collcollate,collctype from pg_collation ;
If the encoding field of a collation is empty, the collation supports all character sets.
encoding | collname | collcollate | collctype
------------+-----------------------+-----------------------+-----------------------
| default | |
| C | C | C
| POSIX | POSIX | POSIX
UTF8 | aa_DJ | aa_DJ.utf8 | aa_DJ.utf8
LATIN1 | aa_DJ | aa_DJ | aa_DJ
LATIN1 | aa_DJ.iso88591 | aa_DJ.iso88591 | aa_DJ.iso88591
UTF8 | aa_DJ.utf8 | aa_DJ.utf8 | aa_DJ.utf8
UTF8 | aa_ER | aa_ER | aa_ER
UTF8 | aa_ER.utf8 | aa_ER.utf8 | aa_ER.utf8
.......
EUC_CN | zh_CN | zh_CN | zh_CN
UTF8 | zh_CN | zh_CN.utf8 | zh_CN.utf8
EUC_CN | zh_CN.gb2312 | zh_CN.gb2312 | zh_CN.gb2312
UTF8 | zh_CN.utf8 | zh_CN.utf8 | zh_CN.utf8
UTF8 | zh_HK | zh_HK.utf8 | zh_HK.utf8
UTF8 | zh_HK.utf8 | zh_HK.utf8 | zh_HK.utf8
EUC_CN | zh_SG | zh_SG | zh_SG
UTF8 | zh_SG | zh_SG.utf8 | zh_SG.utf8
EUC_CN | zh_SG.gb2312 | zh_SG.gb2312 | zh_SG.gb2312
UTF8 | zh_SG.utf8 | zh_SG.utf8 | zh_SG.utf8
EUC_TW | zh_TW | zh_TW.euctw | zh_TW.euctw
UTF8 | zh_TW | zh_TW.utf8 | zh_TW.utf8
EUC_TW | zh_TW.euctw | zh_TW.euctw | zh_TW.euctw
UTF8 | zh_TW.utf8 | zh_TW.utf8 | zh_TW.utf8
UTF8 | zu_ZA | zu_ZA.utf8 | zu_ZA.utf8
LATIN1 | zu_ZA | zu_ZA | zu_ZA
LATIN1 | zu_ZA.iso88591 | zu_ZA.iso88591 | zu_ZA.iso88591
UTF8 | zu_ZA.utf8 | zu_ZA.utf8 | zu_ZA.utf8
(869 rows)
Configure the collation of a database in a locale
Configure the fields of a database in a locale
Prerequisites
Familiarize yourself with the collations that are supported by the character set of the database. Then, execute the following SQL statement to query the encoding format of the database:
select datname,pg_encoding_to_char(encoding) as encoding from pg_database;
Sample output
datname | encoding --------------------+----------- template1 | UTF8 template0 | UTF8 db | SQL_ASCII db1 | EUC_CN contrib_regression | UTF8 test01 | UTF8 test02 | UTF8 postgres | UTF8 (8 rows)
Procedure
When you create a table, execute the following SQL statement to specify a collation that is supported by the character set of the database:
CREATE TABLE test1 ( a text COLLATE "de_DE", b text COLLATE "es_ES" );
Execute the following SQL statement to modify the collation of a column:
NoteWhen you modify the collation of a column in a table, the table is rewritten. Proceed with caution if the table is large.
In the following example,
test1
indicates the name of the table, anda
indicates the name of the column for which you want to modify the collation settings.
alter table test1 alter a type text COLLATE "en_US";
Configure locale settings.
Change the sort order that is specified by the ORDER BY clause. Sample statement:
select * from <table_name> order by <column_name> collate "C";
Change the result that is returned from an operator. Example
Sample statement
select * from <table_name> where <column_name> > 'Tom' collate "C";
Sort data by using locale indexes.
You can sort data by using an index only when the collation specified in the ORDER BY clause is the same as the collation of the index. Sample statement:
create index idxa on <table_name>(<column_name> collate "en_US"); explain select * from <table_name> order by <column_name> collate "en_US";
Sample output:
QUERY PLAN ------------------------------------------------------------------------ Index Only Scan using idxa on <table_name> (cost=0.15..31.55 rows=1360 width=64) (1 row)
Configure a rule to sort results in alphabetical order
You can use one of the following four methods to configure a rule that is used to sort results in alphabetical order:
Use the SQL statements that are supported in your locale. This method does not require you to modify the original data. Sample statement:
select * from <table_name> order by <column_name> collate "en_US";
Use the fields that are supported in your locale. If the database contains data, this method requires you to modify the original data. Sample statement:
alter table <table_name> alter <column_name> type text COLLATE "en_US";
Use the indexes and SQL statements that are supported in your locale. This method does not require you to modify the original data. Sample statement:
create index idxa on <table_name>(<column_name> collate "en_US"); explain select * from <table_name> order by <column_name> collate "en_US";
Sample output:
QUERY PLAN ------------------------------------------------------------------------ Index Only Scan using idxa on <table_name> (cost=0.15..31.55 rows=1360 width=64) (1 row)
Set the collation of the database to en_US. By default, the data in this database is sorted in alphabetical order based on the specified collation. Sample statement:
create database test03 encoding 'UTF8' lc_collate 'en_US.utf8' lc_ctype 'en_US.utf8' template template0; \c test03 select * from (values ('Alice'),('Tom')) as a(c1) order by c1 ;
Sample output:
c1 -------- Alice Tom (2 rows)
NoteA Chinese character may have more than one pronunciation. For example, the Chongqing city in China may be encoded as the Zhongqing city. Proceed with caution if you want to configure a collation that is used to sort Chinese characters based on pronunciations.
Configure a rule to sort results in alphabetical order by using Greenplum
Greenplum does not allow you to specify collations for individual columns. Therefore, the sorting of results in alphabetical order is different in Greenplum.
You can use Greenplum to convert results among character sets. Then, you can sort the results in binary order. This allows you to obtain similar results that resemble results in alphabetical order. Sample statement:
select * from (values ('Alice'), ('Tom')) t(id) order by byteain(textout(convert(id,'UTF8','EUC_CN')));
Sample output:
id
--------
Alice
Tom
(2 rows)