This topic describes how to use the Faker connector.
Background information
The Faker connector is a built-in connector of Realtime Compute for Apache Flink. The connector generates test data based on Java Faker expressions that are provided for each field in a table. If you want to use test data to verify the business logic during deployment development or testing, we recommend that you use the Faker connector.
The following table describes the capabilities supported by the Faker connector.
Item | Description |
Table type | Source table and dimension table |
Running mode | Batch mode and streaming mode |
Data format | N/A |
Metric | N/A |
API type | SQL API |
Data update or deletion in a sink table | N/A |
Prerequisites
N/A
Limits
Only Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 4.0.12 or later supports the Faker connector.
The Faker connector supports only the following data types: CHAR(n), VARCHAR(n), STRING, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, BOOLEAN, TIMESTAMP, ARRAY, MAP, MULTISET, and ROW.
If the Faker connector performs JOIN operations on a dimension table, queries are not performed on the dimension table. The output is generated based on the lookup key in the source table.
Syntax
CREATE TABLE faker_source (
`name` STRING,
`age` INT
) WITH (
'connector' = 'faker',
'fields.name.expression' = '#{superhero.name}',
'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}'
);
Parameters in the WITH clause
Category | Parameter | Description | Data type | Required | Default value | Remarks |
Common parameters | connector | The type of the table. | String | Yes | None | The value of the parameter is faker. |
fields.<field>.expression | The Java Faker expression that generates the value of the field. | String | Yes | None | For more information, see the Field expression section of this topic. | |
fields.<field>.null-rate | The rate at which the value in this field is null. | Float | No | 0.0 | N/A | |
fields.<field>.length | The length of the field of the ARRAY, MAP, or MULTISET data type. | Integer | No | 1 | N/A | |
Parameters only for source tables | number-of-rows | The number of rows of data that is generated. | Integer | No | -1 | If you specify this parameter, the source table is bounded. If you do not specify this parameter, the source table is unbounded. |
rows-per-second | The rate at which random data is generated. | Integer | No | 10000 | Default value: 10000. Unit: record per second. |
Examples
Create a dimension table
CREATE TEMPORARY TABLE datagen_source ( `character_id` INT, `location` STRING, `datagen_name` STRING, `user_fullname` ROW<first_name STRING, last_name STRING>, `user_data` ARRAY<STRING>, `user_score` Map<STRING, INT>, `user_books` MULTISET<STRING>, `proctime` AS PROCTIME() ) WITH ( 'connector' = 'faker', 'fields.character_id.expression' = '#{number.numberBetween ''0'',''10000''}', 'fields.location.expression' = '#{harry_potter.location}', 'fields.datagen_name.expression' = '#{superhero.name}', 'fields.user_fullname.first_name.expression' = '#{superhero.prefix}', 'fields.user_fullname.last_name.expression' = '#{superhero.suffix}', 'fields.user_data.expression' = '#{harry_potter.character}', 'fields.user_data.length' = '2', 'fields.user_score.key.expression' = '#{harry_potter.character}', 'fields.user_score.value.expression' = '#{number.numberBetween ''10'',''100''}', 'fields.user_score.length' = '2', 'fields.user_books.expression' = '#{book.title}', 'fields.user_books.length' = '2', 'number-of-rows' = '5' ); CREATE TEMPORARY TABLE faker_dim ( `character_id` INT, `faker_name` STRING ) WITH ( 'connector' = 'faker', 'fields.character_id.expression' = '#{number.numberBetween ''0'',''100''}', 'fields.faker_name.expression' = '#{harry_potter.characters}' ); SELECT l.character_id, l.location, l.datagen_name, l.user_fullname, l.user_data, l.user_score, l.user_books, c.faker_name FROM datagen_source AS l JOIN faker_dim FOR SYSTEM_TIME AS OF proctime AS c ON l.character_id = c.character_id;
Field expression
Operation
When you use the Faker connector, you must define an expression in the WITH clause for each field in the DDL statement. The expression is in the 'fields.<field>.expression' = '#{className.methodName ''parameter'', ...}' format. The following table describes the parameters in the expression.
Parameter
Description
field
The name of a field in the DDL statement.
className
The name of a Faker class.
Java Faker provides around 80 Faker classes to generate field expressions. You can select the classes based on your business requirements.
NoteFaker class names are not case-sensitive.
methodName
The name of a method.
NoteMethod names are not case-sensitive.
parameter
The input parameters of a method.
NoteEach input parameter of a method must be enclosed in single quotation marks (').
Separate multiple input parameters with commas (,).
Example
In this example, the 'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}' expression for the age field in Syntax is generated based on the Java Faker API documentation by performing the following steps:
Find the Number class in the Java Faker API documentation.
Find the numberBetween method in the Number class and view the method description.
The numberBetween method specifies the numbers between which the return value falls.
Obtain the SQL expression 'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}' for the age field based on the Number class and the values 0 and 1000 that are specified by the numberBetween method.
This expression indicates that the generated value of the age field is in the range of 0 to 1000.