All Products
Search
Document Center

Realtime Compute for Apache Flink:Faker connector

Last Updated:Sep 09, 2024

This topic describes how to use the Faker connector.

Background information

The Faker connector is a built-in connector of Realtime Compute for Apache Flink. The connector generates test data based on Java Faker expressions that are provided for each field in a table. If you want to use test data to verify the business logic during deployment development or testing, we recommend that you use the Faker connector.

The following table describes the capabilities supported by the Faker connector.

Item

Description

Table type

Source table and dimension table

Running mode

Batch mode and streaming mode

Data format

N/A

Metric

N/A

API type

SQL API

Data update or deletion in a sink table

N/A

Prerequisites

N/A

Limits

  • Only Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 4.0.12 or later supports the Faker connector.

  • The Faker connector supports only the following data types: CHAR(n), VARCHAR(n), STRING, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, BOOLEAN, TIMESTAMP, ARRAY, MAP, MULTISET, and ROW.

  • If the Faker connector performs JOIN operations on a dimension table, queries are not performed on the dimension table. The output is generated based on the lookup key in the source table.

Syntax

CREATE TABLE faker_source (
  `name` STRING,
  `age` INT
) WITH (
  'connector' = 'faker',
  'fields.name.expression' = '#{superhero.name}',
  'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}'
);

Parameters in the WITH clause

Category

Parameter

Description

Data type

Required

Default value

Remarks

Common parameters

connector

The type of the table.

String

Yes

None

The value of the parameter is faker.

fields.<field>.expression

The Java Faker expression that generates the value of the field.

String

Yes

None

For more information, see the Field expression section of this topic.

fields.<field>.null-rate

The rate at which the value in this field is null.

Float

No

0.0

N/A

fields.<field>.length

The length of the field of the ARRAY, MAP, or MULTISET data type.

Integer

No

1

N/A

Parameters only for source tables

number-of-rows

The number of rows of data that is generated.

Integer

No

-1

If you specify this parameter, the source table is bounded. If you do not specify this parameter, the source table is unbounded.

rows-per-second

The rate at which random data is generated.

Integer

No

10000

Default value: 10000. Unit: record per second.

Examples

  • Create a dimension table

    CREATE TEMPORARY TABLE datagen_source (
      `character_id` INT,
      `location` STRING,
      `datagen_name` STRING,
      `user_fullname` ROW<first_name STRING, last_name STRING>,
      `user_data` ARRAY<STRING>,
      `user_score` Map<STRING, INT>,
      `user_books` MULTISET<STRING>,
      `proctime` AS PROCTIME()
    ) WITH (
      'connector' = 'faker',
      'fields.character_id.expression' = '#{number.numberBetween ''0'',''10000''}',
      'fields.location.expression' = '#{harry_potter.location}',
      'fields.datagen_name.expression' = '#{superhero.name}',
      'fields.user_fullname.first_name.expression' = '#{superhero.prefix}',
      'fields.user_fullname.last_name.expression' = '#{superhero.suffix}',
      'fields.user_data.expression' = '#{harry_potter.character}',
      'fields.user_data.length' = '2',
      'fields.user_score.key.expression' = '#{harry_potter.character}',
      'fields.user_score.value.expression' = '#{number.numberBetween ''10'',''100''}',
      'fields.user_score.length' = '2',
      'fields.user_books.expression' = '#{book.title}',
      'fields.user_books.length' = '2',
      'number-of-rows' = '5'
    );
    
    CREATE TEMPORARY TABLE faker_dim (
      `character_id` INT,
      `faker_name` STRING
    ) WITH (
      'connector' = 'faker',
      'fields.character_id.expression' = '#{number.numberBetween ''0'',''100''}',
      'fields.faker_name.expression' = '#{harry_potter.characters}'
    );
        
    SELECT
      l.character_id,
      l.location,
      l.datagen_name,
      l.user_fullname,
      l.user_data,
      l.user_score,
      l.user_books,
      c.faker_name
    FROM datagen_source AS l
    JOIN faker_dim FOR SYSTEM_TIME AS OF proctime AS c
    ON l.character_id = c.character_id;

Field expression

  • Operation

    When you use the Faker connector, you must define an expression in the WITH clause for each field in the DDL statement. The expression is in the 'fields.<field>.expression' = '#{className.methodName ''parameter'', ...}' format. The following table describes the parameters in the expression.

    Parameter

    Description

    field

    The name of a field in the DDL statement.

    className

    The name of a Faker class.

    Java Faker provides around 80 Faker classes to generate field expressions. You can select the classes based on your business requirements.

    Note

    Faker class names are not case-sensitive.

    methodName

    The name of a method.

    Note

    Method names are not case-sensitive.

    parameter

    The input parameters of a method.

    Note
    • Each input parameter of a method must be enclosed in single quotation marks (').

    • Separate multiple input parameters with commas (,).

  • Example

    In this example, the 'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}' expression for the age field in Syntax is generated based on the Java Faker API documentation by performing the following steps:

    1. Find the Number class in the Java Faker API documentation.Number类

    2. Find the numberBetween method in the Number class and view the method description.numberBetween

      The numberBetween method specifies the numbers between which the return value falls.

    3. Obtain the SQL expression 'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}' for the age field based on the Number class and the values 0 and 1000 that are specified by the numberBetween method.

      This expression indicates that the generated value of the age field is in the range of 0 to 1000.