Built-in user-defined functions (UDFs)
Complex functions
Function | Description | Version |
Function | Description | Version |
contain | Returns true if a specific value in the specified field is included in the specified value set. | All |
notcontain | Returns true if a specific value in the specified field is excluded from the specified value set. | All |
MATCHINDEX | Returns true if the inverted index of the specified field meets the specified conditions. | All |
QUERY | Returns inverted indexes based on the specified conditions. The syntax of this function is the same as the syntax of the query clause supported by HA3. | All |
sphere_distance | Calculates a distance value based on the specified longitude and latitude. | V3.7.5 and later |
range | | V3.9.0 and later |
normalizescore | | V3.9.0 and later |
Examples
Sample queries
SELECT nid, price, brand, size FROM phone ORDER BY nid LIMIT 1000
USE_TIME: 0.881, ROW_COUNT: 10
------------------------------- TABLE INFO ---------------------------
nid | price | brand | size |
1 | 3599 | Huawei | 5.9 |
2 | 4388 | Huawei | 5.5 |
3 | 899 | Xiaomi | 5 |
4 | 2999 | OPPO | 5.5 |
5 | 1299 | Meizu | 5.5 |
6 | 169 | Nokia | 1.4 |
7 | 3599 | Apple | 4.7 |
8 | 5998 | Apple | 5.5 |
9 | 4298 | Apple | 4.7 |
10 | 5688 | Samsung | 5.6 |
contain
boolean contain(int32 a, const string b)
boolean contain(int64 a, const string b)
boolean contain(string a, const string b)
boolean contain(ARRAY<int32> a, const string b)
boolean contain(ARRAY<int64> a, const string b)
boolean contain(ARRAY<string> a, const string b)
You can use this function to determine whether values in the a field are included in the b value set.
a: a field in the data table. The field that you specify can be a single-value field or multi-value field of the INT32, INT64, or STRING type.
b: a constant string. Separate the specified constants with vertical bars (|
). If a value matches one of the specified constants, this function returns true.
This function returns values of the BOOLEAN type. A value of true indicates that the value in the a field is included in the b value set. A value of false indicates that the value in the a field is excluded from the b value set.
In the following statement, the contain
function is used to query data entries in the rows in which the nid field values are 1, 2, or 3.
SELECT nid, price, brand, size FROM phone WHERE contain(nid, '1|2|3') ORDER BY nid LIMIT 100
USE_TIME: 0.059, ROW_COUNT: 3
------------------------------- TABLE INFO ---------------------------
nid | price | brand | size |
1 | 3599 | Huawei | 5.9 |
2 | 4388 | Huawei | 5.5 |
3 | 899 | Xiaomi | 5 |
notcontain
boolean notcontain(int32 a, const string b)
boolean notcontain(int64 a, const string b)
boolean notcontain(string a, const string b)
boolean notcontain(ARRAY<int32> a, const string b)
boolean notcontain(ARRAY<int64> a, const string b)
boolean notcontain(ARRAY<string> a, const string b)
You can use this function to determine whether values in the a field are not included in the b value set.
a: a field in the data table. The field that you specify can be a single-value field or multi-value field of the INT32, INT64, or STRING type.
b: a constant string. Separate the specified constants with vertical bars (|
). If a value matches one of the specified constants, this function returns false.
This function returns values of the BOOLEAN type. A value of true indicates that the value in the a field is not included in the b value set. A value of false indicates that the value in the a field is included in the b value set.
In the following statement, the notcontain
function is used to query data entries in the rows in which the nid field values are not 1, 2, or 3.
SELECT nid, price, brand, size FROM phone WHERE notcontain(nid, '1|2|3') ORDER BY nid LIMIT 100
USE_TIME: 0.092, ROW_COUNT: 7
------------------------------- TABLE INFO ---------------------------
nid | price | brand | size |
4 | 2999 | OPPO | 5.5 |
5 | 1299 | Meizu | 5.5 |
6 | 169 | Nokia | 1.4 |
7 | 3599 | Apple | 4.7 |
8 | 5998 | Apple | 5.5 |
9 | 4298 | Apple | 4.7 |
10 | 5688 | Samsung | 5.6 |
MATCHINDEX
You can use this function to determine whether a value in the a field includes the value of the b parameter. You can use this function to obtain the index of a field.
You can use this function only in the WHERE clause to optimize inverted indexing in the index retrieval phase.
a: a field of the STRING type. The system creates an inverted index based on the specified field to optimize the query.
b: a keyword of the STRING type.
The value of the b parameter can be used as a search query string for positional indexing. For more information, see query clause.
c: optional. You can specify information such as an analyzer and stop words, which are used to tokenize the value of the b parameter.
ou can use the following parameters in the value of the c parameter. For more information about the parameters, see the Appendix section of this topic.
You must specify a colon (:
) between a parameter name and the corresponding value and separate multiple parameters with commas (,
).
global_analyzer
specific_index_analyzer
no_token_indexes
tokenize_query
remove_stopwords
default_op
Return values
This function returns values of the BOOLEAN type. A value of true indicates that the value in the a field includes the value of the b parameter. A value of false indicates that the value in the a field does not include the value of the b parameter.
In the following statement, the MATCHINDEX
function is used to query data entries in the rows in which the values of the title
field include the Lens keyword.
SELECT nid, brand FROM phone WHERE MATCHINDEX('title', 'Lens')
nid | brand |
1 | Huawei |
QUERY
boolean QUERY(const string a, const string b)
boolean QUERY(const string a, const string b, const string c)
You can use this function to determine whether a value in the a field includes the value of the b parameter. The system automatically converts the specified keyword to terms and performs a query.
This function supports the syntax of the query clause in SQL statements supported by HA3. For information about the query clause, see query clause.
You can use this function only in the WHERE clause to optimize inverted indexing in the index retrieval phase.
a: a field of the STRING type. The system automatically uses the specified field as the value of the default_index parameter in the config clause. For information about the config clause, see Config clause.
b: a keyword of the STRING type.
The system includes the value of the b parameter in the statement after the system optimizes the statement and queries data from the specified ranges.
c: optional. You can specify information such as an analyzer and stop words, which are used to tokenize the value of the b parameter.
You must specify a colon (:
) between a parameter name and the corresponding value and separate multiple parameters with commas (,).
You can use the following parameters in the value of the c parameter:
global_analyzer
specific_index_analyzer
no_token_indexes
tokenize_query
remove_stopwords
default_op
Return values
This function returns values of the BOOLEAN type. A value of true indicates that the value in the a field includes the value of the b parameter. A value of false indicates that the value in the a field does not include the value of the b parameter.
SELECT nid, price, brand, size FROM phone WHERE QUERY('title', 'Huawei')
USE_TIME: 0.034, ROW_COUNT: 1
------------------------------- TABLE INFO ---------------------------
nid | price | brand | size |
2 | 4388 | Huawei | 5.5 |
SELECT nid, price, brand, size FROM phone
WHERE QUERY('title', 'Huawei OR OPPO')
USE_TIME: 0.03, ROW_COUNT: 2
------------------------------- TABLE INFO ---------------------------
nid | price | brand | size |
2 | 4388 | Huawei | 5.5 |
4 | 2999 | OPPO | 5.5 |
In the following statement, the default_op parameter is included in the value of the c parameter. You can include multiple parameters in the value of the c parameter by specifying a colon (:
) between a parameter name and the corresponding value and separating multiple parameters with commas (,).
SELECT nid, price, brand, size FROM phone
WHERE QUERY('title', 'Huawei OPPO', 'default_op:OR,remove_stopwords:true')
The b parameter in this function is parsed by the syntax parser for query clauses provided by HA3. For information about the syntax of the query clauses supported by HA3, see query clause. If you set the b parameter to a constant string and execute a statement to perform the query, do not enclose the constant string in single quotation marks ('). For example, the query=Huawei OPPO clause in a statement is equivalent to the WHERE QUERY ('title', 'Huawei OPPO') condition in a QUERY function. If you use a logical operator in the query clause, you can enclose each constant in single quotation marks ('). For example, the query='Huawei' AND 'OPPO' clause in a statement is equivalent to the WHERE QUERY ('title', 'Huawei' AND 'OPPO') condition in a QUERY function.
Error type | Incorrect form | Correct form |
Error type | Incorrect form | Correct form |
The query fails because the function syntax is invalid. | QUERY('pidvid','123:456') | QUERY('pidvid','"123:456"') |
sphere_distance
double sphere_distance(LOCATION point, double longitude, double latitude)
You can use this function to calculate the distance between a document and the location that is determined by the specified latitude and longitude.
point: a field of the LOCATION type. Values in the specified field must be in the single-value format.
longitude: the longitude of a location.
latitude: the latitude of a location.
The spherical distance.
In the following statement, the sphere_distance
function is used. The geo field must be a single-value field of the LOCATION type, and a forward index is created based on this field.
SELECT sphere_distance(geo,127.0,30.0) FROM phone
range
boolean range(int8 v, const string rangeDesc)
boolean range(uint8 v, const string rangeDesc)
boolean range(int16 v, const string rangeDesc)
boolean range(uint16 v, const string rangeDesc)
boolean range(int32 v, const string rangeDesc)
boolean range(uint32 v, const string rangeDesc)
boolean range(int64 v, const string rangeDesc)
boolean range(uint64 v, const string rangeDesc)
boolean range(float v, const string rangeDesc)
boolean range(double v, const string rangeDesc)
You can use this function to determine whether a value in the v field that corresponds to a forward index is in the range specified by the rangeDesc parameter.
v: the field in which the data that you want to query is stored. The field must be a single-value field.
rangeDesc: a value range, which can be an open interval, a closed interval, or a half-closed interval.
The following table describes examples of this function and the corresponding return values.
Example | Return value |
range(v, "[0, 100]") | 0<=v<=100 |
range(v, "(0, 100)") | 0<v<100 |
range(v, "[0, 100)") | 0<=v<100 |
range(v, "(0, 100]") | 0<v<=100 |
range(v, "(0,)") range(v, "(0,]") | 0<v |
range(v, "[0,)") range(v, "[0,]") | 0<=v |
range(v, "(,100)") range(v, "[,100)") | v<100 |
range(v, "(,100]") range(v, "[,100]") | v<=100 |
range(v, "(,)") range(v, "[,]") range(v, "[,)") range(v, "(,]") | true |
If you prefix the value of the rangeDesc parameter with !, the NOT operation is performed on the specified value range.
In the following statement, the range
function is used.
SELECT nid FROM phone where range(price,"(127.0,30.0)")
SELECT nid FROM phone where range(price,"!(127.0,30.0)")
normalizescore
double normalizescore(int8 v, const string defaultScore)
double normalizescore(uint8 v, const string defaultScore)
double normalizescore(int16 v, const string defaultScore)
double normalizescore(uint16 v, const string defaultScore)
double normalizescore(int32 v, const string defaultScore)
double normalizescore(uint32 v, const string defaultScore)
double normalizescore(int64 v, const string defaultScore)
double normalizescore(uint64 v, const string defaultScore)
double normalizescore(float v, const string defaultScore)
double normalizescore(double v, const string defaultScore)
You can use this function to normalize the value in the v field to a value of the DOUBLE type. If the value is initialized, the function returns the input value. If the value is uninitialized, the function returns the value of the defaultScore parameter.
v: the field in which the data that you want to normalize is stored. The field must be a single-value field.
defaultScore: a constant string that can be converted to a string of the DOUBLE type.
If the value in the v field is initialized, the function returns the input value. If the value in the v field is uninitialized, the function returns the value of the defaultScore parameter.
In the following statement, the normalizescore function is used to normalize the price field in doc1, doc2, and doc3.
doc1: price=1.0
doc2: price= (Uninitialized)
doc3: price=2.0
Execution
select normalizescore(price, "1000.0") as normalized_score from phone
USE_TIME: 32.141ms, ROW_COUNT: 2
normalized_score(double) |
1.0 |
1000.0 |
2.0. |
Appendix
Analyzer settings
You can configure the following parameters to specify analyzers when you use functions such as MATCHINDEX and QUERY:
Specifies a global analyzer. The analyzer that is specified by this parameter has a higher priority than the analyzer that you specified in the schema. The specified analyzer must be included in the analyzer.json file.
Specifies an analyzer for the specified field. The analyzer that is specified by this parameter has a higher priority than the analyzer that you specified in the global_analyzer parameter and the analyzer that you specified in the schema.
Specifies the fields whose values you want the system to not convert to terms. The configuration of this parameter does not affect other operations that are performed on the values in the specified fields, such as normalization and stop word removal.
Separate multiple fields with semicolons (;)
.
true
: converts the search query to terms. false
: does not convert the search query to terms. If you configure an analyzer when you specify the QUERY
UDF, this parameter does not take effect.
The default value is true
.
true
: deletes stop words. false
: does not delete stop words. You can specify stop words when you configure the analyzer.
The default value is true
.
Specifies the logical relationship between the terms that are returned after the system tokenizes the search query by using the default analyzer. Valid values: AND
and OR
. You can specify a default operator when you configure biz. For information about how to configure biz, see biz configuration.
Supported UDFs from third-parties
Complex functions
Function | Description | Version |
Function | Description | Version |
type cast | Converts values from one data type to another data type. | All |