Samples all values that are read from column_name based on x and y, and filters out the rows that do not meet sampling conditions.
Syntax
boolean sample(<x>, <y>, [<column_name1>, <column_name2>[,...]])
Parameters
x and y: x is required. x and y are integer constants that are greater than 0. Their values are of the BIGINT type. The two parameters indicate that the values fall into x portions based on the hash function and the yth portion is used.
y is optional. If no value is specified for the y parameter, the first portion is used, and you do not need to specify column_name.
If x or y is of another data type, the value of x or y is less than or equal to 0, or y is greater than x, an error is returned. If the value of x or y is null, null is returned.
column_name: optional. The name of the column on which sampling is performed. If no value is specified for this parameter, random sampling is performed based on the values of x and y. The name of the column can be of any data type, and the column value can be null. Implicit conversions are not performed. If column_name is set to null, an error is returned.
NoteTo prevent data skew due to the null value, uniform hashing is performed on the null values in column_name in x portions. If you do not configure the column_name parameter, data skew may occur. We recommend that you configure the column_name parameter.
Random sampling can be performed only on columns of the following data types: BIGINT, DATETIME, BOOLEAN, DOUBLE, STRING, BINARY, CHAR, and VARCHAR.
Return value
A value of the BOOLEAN type is returned.
Examples:
In this example, the mf_sample
table is used. This table contains the following data:
+------------+------+------+------+------+
| id | col1 | col2 | col3 | col4 |
+------------+------+------+------+------+
| 3 | eee | rrr | tttt | ggggg |
| 4 | yyy | uuuu | iiiii | ccccccc |
| 1 | "abc" | "bcd" | "rthg" | "ahgjeog" |
| 2 | "a1bc" | "bc1d" | "rt1hg" | "ahgjeog" |
+------------+------+------+------+------+
Randomly hash the data in the table to two portions, and take the first portion.
select * from mf_sample where sample(2,1);
The following result is returned:
+------------+------+------+------+------+ | id | col1 | col2 | col3 | col4 | +------------+------+------+------+------+ | 3 | eee | rrr | tttt | ggggg | | 1 | "abc" | "bcd" | "rthg" | "ahgjeog" | +------------+------+------+------+------+
Randomly hash the values in the
id
column of the table to two portions, and take the first portion.select * from mf_sample where sample(2,1,id);
The following result is returned:
+------------+------+------+------+------+ | id | col1 | col2 | col3 | col4 | +------------+------+------+------+------+ | 4 | yyy | uuuu | iiiii | ccccccc | | 2 | "a1bc" | "bc1d" | "rt1hg" | "ahgjeog" | +------------+------+------+------+------+
Related functions
For more information, see Other functions.