Flink feature release notes (April 1, 2024) - Realtime Compute for Apache Flink

Important

This version is rolling out gradually via canary release across the network. The rollout is expected to complete within 3–4 weeks. To check the current upgrade plan, see the latest announcement on the right side of the Realtime Compute for Apache Flink console homepage. If the new features are not yet available in your account, submit a ticket to request early access.

Ververica Runtime (VVR) 8.0.6 was released on April 1, 2024. This version is built on Apache Flink 1.17.2 and brings improvements across real-time lakehouse integration, connectors, and SQL window functions.

After the canary release completes, upgrade your deployments to this version. For instructions, see Upgrade the engine version of deployments.

What's new

Real-time lakehouse

Apache Paimon writes to OSS-HDFS

Apache Paimon data can now be written to OSS-HDFS, giving you a cost-effective storage option for lakehouse workloads. When you run CREATE TABLE AS or CREATE DATABASE AS to write to Apache Paimon, the resulting table is created in dynamic bucket mode automatically. This release also incorporates all Apache Paimon community features and fixes from the master branch up to March 15, 2024. For more information, visit Apache Paimon.

Hive data backed by OSS-HDFS via Hive catalogs

After configuring a Hive catalog, you can write Hive data directly to OSS-HDFS. This lets you build a Hive data warehouse on OSS-HDFS without changing your catalog-based workflow.

Non-Hive tables in DLF-based Hive catalogs

When Data Lake Formation (DLF) is your Hive catalog's metadata management center, you can now create non-Hive tables through the Hive catalog. This makes it easier to manage different table types from a single catalog.

Connectors

OceanBase CDC connector — Read from OceanBase source tables (public preview)

You can now read data from an OceanBase source table using the Change Data Capture (CDC) connector, ingesting OceanBase data directly into Flink pipelines and building tiered real-time data warehouses on OceanBase. For details, see OceanBase connector (public preview).

MongoDB CDC connector — CREATE TABLE AS and CREATE DATABASE AS (public preview)

Run CREATE TABLE AS or CREATE DATABASE AS to synchronize both data and schema changes from a MongoDB database to downstream tables in real time. For details, see Manage MongoDB catalogs (public preview), CREATE TABLE AS statement, CREATE DATABASE AS statement, and MongoDB connector (public preview).

PostgreSQL CDC connector — Concurrent full data reading (public preview)

Full data is now read concurrently from PostgreSQL CDC source tables, significantly reducing the time needed to complete initial full-data synchronization. For details, see PostgreSQL CDC connector (public preview).

Hologres connector — TIMESTAMP_LTZ data type support

The Hologres connector now supports the TIMESTAMP_LTZ data type, making it straightforward to process and analyze time-zone-aware data and improving data accuracy. In addition, the issue that a time difference exists when data is synchronized from a MySQL CDC source table to Hologres is fixed in this version. For details, see Hologres connector.

MaxCompute connector — Upsert Tunnel and schema support

Two enhancements are included in this release:

Write data to Transaction Table 2.0 tables using MaxCompute Upsert Tunnel.
Specify a schema so the connector can read from and write to tables in a MaxCompute project with the schema feature enabled.

For details, see MaxCompute connector.

Elasticsearch — Column-based routing keys

Specify any column as a routing key for real-time Elasticsearch indexing. This gives you finer control over how documents are distributed across shards. For details, see Elasticsearch.

Apache Kafka connector — Null handling and header-based filtering

Two improvements reduce unwanted data and improve distribution:

Empty column values are no longer written as null values to JSON strings, reducing unnecessary storage consumption.
Kafka data can be filtered based on headers during writing, making it easier to route data to the right destinations.

For details, see Apache Kafka connector and JSON.

OSS connector — Enhanced bucket authentication

After specifying a file system path, you must configure Object Storage Service (OSS) bucket authentication to read from or write to that path. This ensures your jobs have the right credentials before accessing OSS data. For details, see OSS connector.

StarRocks connector — JSON type support

Write JSON type data to StarRocks using the StarRocks connector to meet business requirements that involve semi-structured data.

Simple Log Service connector — Null values as empty strings

Null values are now written as empty strings to logs instead of being dropped, making it easier to handle fields that contain null values. For details, see Simple Log Service connector.

SQL enhancements

CUMULATE function — Update stream support for WindowAggregate

The WindowAggregate operator of the CUMULATE function now supports update streams. With this release, all four window functions — TUMBLE, HOP, CUMULATE, and SESSION — support window aggregation for update streams such as CDC data streams. In Apache Flink 1.18 and earlier, window functions did not support this. For details, see Queries.

Bug fixes

The following issues are fixed in this release:

The shardWrite parameter configuration for ClickHouse result tables did not take effect.
Savepoints of deployments could not be generated in extreme cases.
All issues included in the Apache Flink 1.17.2 community release. For the full list, see Apache Flink 1.17.2 Release Announcement.