rankTvf
rankTvf is similar to the rank window function of SQL statements. This function is used to filter data after data is dispersed. The following sample code provides the prototype of the function:
rankTvf("group_key", "sort_key", "reserved_count", (sql))
group_key: the field that is used to disperse data. You can specify multiple fields or leave the parameter empty. Separate multiple fields with commas (,).
sort_key: the field that you want to use to sort data. You can specify multiple fields. The plus sign (+) specifies that the system sorts data in ascending order. The minus sign (-) specifies that the system sorts data in descending order. By default, the system sorts data in ascending order. You cannot leave this field empty.
reserved_count: the number of returned records that you want to reserve in each group. A negative number specifies that all returned records are reserved in each group.
sql: the SQL statement whose query result needs to be dispersed.
After rankTvf is used to disperse and filter results of SQL statements, the output results are still sorted in the order in which the original results of SQL statements are sorted. The filtered rows are deleted from the output result.
Example:
select * from table (
rankTvf('brand','-size','1', (SELECT brand, size FROM phone))
)
order by brand
limit 100
sortTvf
sortTvf is used to provide the local top K feature. For example, you can use the sortTvf function to sort the top K results returned by the searcher and perform join operations on the results. If you use an ORDER BY clause, the clause is pushed to the query record searcher (QRS) for join operations. ORDER BY clauses are used for global sort operations.
The following sample code provides the prototype of the function:
sortTvf("sort_key", "reserved_count", (sql))
sort_key: the field that you want to use to sort data. You can specify multiple fields. The plus sign (+) specifies that the system sorts data in ascending order. The minus sign (-) specifies that the system sorts data in descending order. By default, the system sorts data in ascending order. You cannot leave this field empty.
reserved_count: the number of fields that are reserved in each group.
sql: the SQL statement whose query result needs to be sorted.
Difference between sortTvf and rankTvf: sortTvf changes the order of rows in the original result of SQL statements in tables.
Example:
select * from table (
sortTvf('-size','3', (SELECT brand, size FROM phone))
)
unpackMultiValue
The unpackMultiValue function is used to unpack multi-value fields. This function is used when GROUP BY clauses include multi-value fields and you want to expand the group-based results of multi-value fields.
The following sample code provides the prototype of the function:
unpackMultiValue("unpack_keys", (sql))
unpack_keys: the name of the field that you want to unpack. You can specify multiple fields. Separate multiple fields with commas (,).
sql: the SQL statement whose query result needs to be unpacked.
Usage notes:
The type of unpacked multi-value fields does not change. Each unpacked multi-value field contains only one value or the unpacked field is empty.
We recommend that you unpack a small number of multi-value fields. If you unpack a large number of multi-value fields, the system may become overloaded due to the cartesian product relationship between fields.
Example:
select * from table (
unpackMultiValue('desc,price,size', (SELECT desc, price, size FROM phone where size > 5.5))
)
graphSearchTvf
The graphSearchTvf function uses a deep learning model that is based on the table-valued function (TVF) in the same process as a searcher. This model is used to score documents. The following sample code provides the prototype of the function:
graphSearchTvf("biz_name", "pk_name", "qinfo_kv", (sql))
biz_name: the name of the biz configuration to which the scoring model belongs. The biz configuration must be in the same process as the searcher that executes the SQL statement.
pk_name: the name of the primary key in the scoring model. You can specify a field of the input table as the primary key.
qinfo_kv: the key-value pair that is specified in compliance with the qinfo protocol.
sql: the SQL statement that requires in-depth scoring.
After the graphSearchTvf function is performed, the output results are still sorted in the order in which the original results of SQL statements are sorted. The __buildin_score__ field of the FLOAT type is added to store the scoring results.
Example:
select * from table (
graphSearchTvf('model','store_id','fg_encode:abc', (SELECT brand, size FROM phone))
)
order by __buildin_score__
limit 100
enableShuffleTvf
The enableShuffleTvf function is used to switch SQL statements from a TVF to the QRS. For example, SQL statements that need to be processed by the rankTvf function can be pushed down to a searcher by default. If an SQL statement in the rankTvf function contains enableShuffleTvf, the rankTvf function only runs on the QRS. The following sample code provides the prototype of the function:
enableShuffleTvf((sql))
The following sample code provides an example on how to use the enableShuffleTvf function:
select * from table (
enableShuffleTvf((SELECT brand, size FROM phone))
)
inputTableTvf
The InputTableTvf function is used to construct multiple rows of data. In most cases, the function is used with logical tables. The following sample code provides the prototype of the function:
inputTableTvf('data', (sql))
data describes multiple rows of data. Separate multiple rows with semicolons (;). Separate multiple columns in a row with commas (,). Example: 100,0.1;200,0.2.
sql: specifies the schema for the constructed data. For example, if the first column of the table is of the INT64 type and the second column of the table is of the FLOAT type in the response of the sql parameter, data is parsed based on this schema.
distinctTopNTvf
The distinctTopNTvf function is used to disperse and sort data. The following sample code provides the prototype of the function:
distinctTopNTvf('distinct_fields', 'sort_fields', 'topn', 'distinct_count', 'searcher_topn', 'searcher_distinct_count', (sql))
The distinct_fields function is used to disperse fields. You can specify multiple fields or leave the parameter empty. Separate multiple fields with commas (,).
sort_fields: the field that you want to use to sort data. You can specify multiple fields. The plus sign (+) specifies that the system sorts data in ascending order. The minus sign (-) specifies that the system sorts data in descending order. By default, the system sorts data in ascending order. You cannot leave this field empty.
topn: the total number of returned records that are reserved.
distinct_count: the number of fields that are reserved for each dispersing group.
searcher_topn: the total number of returned records that are reserved in each column in the searcher.
searcher_distinct_count: the number of fields that are reserved for each dispersing group in each column in the searcher.
Calculation process:
The sort_fields parameter is used to sort data.
Each row is traversed.
distinct_fields is used to calculate groups. If the number of groups is less than the value specified by distinct_count, the row is put into the output queue and the number of groups plus one is the total number of groups. If the number of groups is greater than or equal to the value of the distinct_count parameter, the row is put into the waiting queue.
If the number of rows in the output queue is greater than or equal to topn, some rows are returned from the output queue. The following parameter specifies the number of returned rows: topn.
If the number of rows in the output queue is n and n is less than the value of the topn parameter, some rows from the waiting queue are merged with the output queue in a required order. The number of rows that are taken from the waiting queue is calculated by using the following formula: Value of the topn parameter - n. Then, the merged rows are returned.
OneSummaryTvf
The OneSummaryTvf function is used to execute the one summary plug-in chain. This function is compatible with the original logic of the one summary plug-in in the HA3 plug-in platform.
"sql_tvf_plugin_config" : {
"modules" : [
{
"module_name": "tvf",
"module_path": "libPluginOnline.so",
"parameters": {}
}
],
"tvf_profiles" : [
{
"func_name": "OneSummaryTvf",
"tvf_name": "tpp", // The TVF name.
"parameters": {
"config_path" : "pluginsConf/one_summary.json", // The configuration of the one summary plug-in.
"schema_path" : "schemas/mainse_summary_schema.json", //The configuration of the summary schema.
"chain" : "tpp_default" // The summary chain of the TVF.
}
}
]
}
You can use the OneSummaryTvf function to configure multiple TVF prototypes. Each prototype represents a one summary plug-in chain.
tpp(
'dl:tpp_default', // kvpair
'', // The query clause. In most cases, the query clause is not required.
( // sql
SELECT
*
FROM
mainse_summary_summary_
WHERE
ha_in(nid, '620598126603|619969492623')
)
)
hint
By default, the summary plug-in only processes the original input fields. If you want to add fields, register the information about additional fields in iquan. You can register the information about additional fields by using the collectOutputInfo interface of the summary plug-in. If you do not register the information, the results of additional fields are not provided.
typedef std::vector<std::pair<std::string, std::string>> SummaryOutputInfoType;
virtual SummaryOutputInfoType collectOutputInfo();
// example:
SummaryOutputInfoType PostFeeSummaryExtractor::collectOutputInfo() {
if (_dstSummaryField.empty()) {
return {};
}
return {make_pair(_dstSummaryField, "string")}; // [(field name,type)]
}
rerankByQuotaTvf
rerankByQuotaTvf('group_fields',
'actual_group_field_values',
'quota_num',
'rerank_fields',
'sort_fields',
'per_rerank_field_max_num',
'window_size',
'need_makeup',
(sql)
)
group_fields
The field based on which data is grouped. You can specify multiple columns. Separate multiple columns with semicolons (;).
actual_group_field_values
All enumerated values of the group_fields parameter. If the value of the group_fields parameter is a single column, separate all enumerated values with semicolons (;). If the value of the group_fields parameter consists of multiple columns, separate all groups of enumerated values with semicolons (;). In each group of enumerated values, separate the values of multiple columns with commas (,).
quota_num
The quotas for each group. The parameter specifies the quotas of two phases for a group. For example, a maximum of ten data entries can be returned on the searcher in the first phase, and a maximum of 20 data entries can be returned in the second phase. In this case, the parameter value for this group is (10,20). Separate the quotas of the two phases for a group with commas (,). The number of groups for which you specify this parameter must be the same as the group number that is specified by the actual_group_field_values parameter. If the number of groups for which you specify this parameter is less than the value of the actual_group_field_values parameter, then the quota that you specified for the last group serves as the default quota of the additional groups for which you do not specify the quota_num parameter. If the number of groups for which you specify this parameter is greater than the value of the actual_group_field_values parameter, the system reports an error.
rerank_fields
The field that you want to disperse. You can specify multiple columns. Separate multiple columns with semicolons (;). If you do not want to disperse fields, leave this field empty.
sort_fields
The fields that are used to sort data. You can specify multiple columns. Example: +field0;-field1;field2. + specifies that data is sorted in ascending order. - specifies that data is sorted in descending order. By default, data is sorted in ascending order.
per_rerank_field_max_num
The maximum number of returned results in each category in a group. You can use this parameter if you want to disperse data.
window_size
The window that is used to unevenly disperse data between groups. You can use this parameter when you want to unevenly disperse data.
need_makeup
Specifies whether to supplement the result to reach the quotas. t specifies that the result is supplemented. f specifies that the result is not supplemented. This parameter specifies the supplement choices of two phases, such as t;f. The supplement choices of the two phases are separated by semicolons (;).
Usage notes: Separate fields of different groups with semicolons (;). The fields can be multi-value fields. Separate multiple enumerated multi-value fields with number signs (#).
Procedure
The system calculates input_hash based on the enumerated values of the grouping field.
The system traverses data in the table, calculates the hash value of all data, and then matches the hash value with the value of the input_hash parameter. In this case, the groups that correspond to the enumerated group data are obtained.
The system disperses multiple groups that are obtained in Step 2.
Determine whether groups need to be dispersed based on the rerank_fields parameter. If the rerank_fields parameter is left empty, groups do not need to be dispersed. In this case, you can set the per_rerank_field_max_num and window_size parameters to 0.
If you want to disperse groups, perform the following steps:
Group data based on the value specified by the rerank_fields parameter.
Sort data in groups based on the value of the sort_fields parameter.
The groups are sorted preferentially based on the window_size parameter. The window_size parameter specifies the window size of a group. Data in groups is extracted for several times.
If the extracted data from a group reaches the value of the per_rerank_field_max_num parameter, the system stops extracting data from the group. If the amount of data extracted from all groups reaches the upper limit, the system stops extracting data from all groups.
After the system stops extracting data from all groups, you can obtain two sets of data. One set of data meets all the rules of the user. The other set of data is a backup set.
If you do not want to disperse groups, perform the following steps:
Sort data in groups based on the value of the sort_fields parameter.
The system filters data in all groups based on quotas. You can obtain two sets of data. One set of data reaches the specified quota. The other set of data is a backup set.
Specify whether to supplement the result to reach the quotas. If you want to supplement the result, append the data in the backup set to the result. If you do not need to supplement the result, discard the backup data set. The policy that is used to supplement data varies based on the sequence of a group in the enumerated values of the actual_group_field_values parameter. A smaller sequence number indicates a higher priority of data supplementation.
Example
rerankByQuotaTvf('trigger_type',
'1;11',
'100,200;100,300',
'ju_category_id',
'-final_score',
'50',
'15',
't;t',
(sql)
)
simpleRerankTvf
simpleRerankTvf('group_fields',
'actual_group_field_values',
'quota_num',
'sort_fields',
'need_makeup',
(sql)
)
group_fields
The field based on which data is grouped. You can specify multiple columns. Separate multiple columns with semicolons (;).
actual_group_field_values
All enumerated values of the group_fields parameter. If the value of the group_fields parameter is a single column, separate all enumerated values of different groups with semicolons (;). If the value of the group_fields parameter consists of multiple columns, separate enumerated values of different groups with semicolons (;). Separate the enumerated values of multiple columns for a group with commas (,).
quota_num
The quotas for each group.
sort_fields
The fields that are used to sort data. You can specify multiple columns such as +field0;-field1;field2. + specifies that data is sorted in ascending order. - specifies that data is sorted in descending order. By default, data is sorted in ascending order. If this parameter is left empty, data is not sorted.
need_makeup
Specifies whether to supplement the result to reach the quotas. t specifies that the result needs to be supplemented. f specifies that the result does not need to be supplemented. This parameter specifies the quotas of two phases.
Procedure
The system calculates input_hash based on the enumerated values of the grouping field.
The system traverses data in the table, calculates the hash value of all data, and then matches the hash value with the value of the input_hash parameter. In this case, the groups that correspond to the enumerated group data are obtained.
The system disperses and supplements data for groups that are obtained in Step 2.
The system sorts data in groups based on the value of the sort_fields parameter. If the sort_fields parameter is left empty, data is not sorted.
The system filters data in all groups based on quotas. You can obtain two sets of data. One set of data reaches the specified quota. The other set of data is a backup set.
Specify whether to supplement the result to reach the quotas. If you want to supplement the result, append the data in the backup set to the result. If you do not need to supplement the result, discard the data in the backup set. The policy that is used to supplement data varies based on the sequence of a group in the enumerated values of the actual_group_field_values parameter. A smaller sequence number indicates a higher priority of data supplementation.
Example
simpleRerankTvf('trigger_type',
'1;11',
'100',
'-final_score',
't',
(sql)
)
Usage notes
When you execute SELECT statements to query the TVF results, include table (xxxTvf()) in FROM clauses.
We recommend that you enclose SQL statements in TVFs in parentheses (). You can execute only one SQL statement at a time. For example, an error is reported for sql1 union all sql2. To resolve the issue, change the statement into select * from (sql1 union all sql2).
TVFs support only parameters of the STRING type.