This topic describes how to use the smlar extension to calculate the similarity between two arrays of the same data type.
Prerequisites
The RDS instance runs PostgreSQL 10 or a later version.
NoteThis extension is not supported by ApsaraDB RDS for PostgreSQL instances that run PostgreSQL 17.
The minor engine version of the RDS instance is updated if the major engine version of the RDS instance meets the requirements but the extension is still not supported. For more information, see Update the minor engine version.
Background information
The smlar extension provides multiple functions to calculate the similarity between two arrays of the same data type. It also provides parameters to control the similarity calculation methods. All built-in data types are supported.
Function description
float4 smlar(anyarray, anyarray)
Calculates the similarity between two arrays of the same data type.
float4 smlar(anyarray, anyarray, bool useIntersect)
Calculates the similarity between two arrays of composite data types. The composite data type is defined as follows:
CREATE TYPE type_name AS (element_name anytype, weight_name FLOAT4);
When the useIntersect parameter is set to true, only the parts that contain duplicate elements are calculated. When the useIntersect parameter is set to false, all elements are used for calculation.
float4 smlar( anyarray a, anyarray b, text formula )
Calculates the similarity between two arrays of the same data type. The arrays are specified by the formula parameter.
The predefined variables for formula are described as follows:
N.i: The number of common elements in the two arrays.
N.a: The number of distinct elements in array a.
N.b: The number of distinct elements in array b.
float4 set_smlar_limit(float4)
Sets the smlar.threshold parameter.
float4 show_smlar_limit()
Displays the smlar.threshold parameter value.
anyarray % anyarray
Returns true if the similarity between arrays is greater than the smlar.threshold parameter value. Otherwise, the function returns false.
text[] tsvector2textarray(tsvector)
Converts the tsvector type to the text type.
anyarray array_unique(anyarray)
Sorts the elements (excluding duplicate elements) in an array.
float4 inarray(anyarray, anyelement)
Returns 1 if the anyelement parameter value exists in the anyarray parameter value. Otherwise, the function returns false. 0.
float4 inarray(anyarray, anyelement, float4, float4)
Returns the third parameter value if anyelement exists in anyarray. Otherwise, the function returns the fourth parameter value.
For more information about parameter descriptions and supported data types, visit smlar.
Use the extension
After you have connected to an instance, execute the following statement to create a smlar extension:
testdb=> create extension smlar;
Execute the following statements to use basic functions of smlar:
testdb=> SELECT smlar('{1,4,6}'::int[], '{5,4,6}' ); smlar ---------- 0.666667 (1 row) testdb=> SELECT smlar('{1,4,6}'::int[], '{5,4,6}', 'N.i / sqrt(N.a * N.b)' ); smlar ---------- 0.666667 (1 row)
Execute the following statement to remove smlar:
testdb=> drop extension smlar;