Welcome to Flink Forward Asia! This year's event marks the first occurrence of Flink Forward in Jakarta, Indonesia, a significant milestone as it's the first time hosted in a Southeast Asian country. I'm Feng Wang from Alibaba Cloud, and I'm excited to engage with so many local developers passionate about technology.
Apache Flink, launched in 2014, celebrates its 10th anniversary this year. This occasion provides a perfect opportunity to explore the evolution of Flink, its current status, and its future directions.
Let’s take a moment to reflect on the remarkable milestones that Apache Flink has achieved over the past decade. These milestones underscore Apache Flink's pivotal role in the evolution of data technology, facilitating the transition from batch processing to real-time processing.
Similar to Apache Spark, Apache Flink originated from a university research project called "Stratosphere," based at the Technische Universität Berlin in 2009. In 2014, Stratosphere’s core team donated the project to the Apache Software Foundation, where it was renamed Flink. They then established a company, originally known as dataArtisans, now referred to as Ververica.
On the other side of the globe, Alibaba Group experienced rapid growth, leading to an exponential increase in data generated on its e-commerce platform. To explore the next generation of data platforms capable of processing data in real-time, we conducted research, analysis, and experiments. Ultimately, we selected Apache Flink as our unified streaming technology and processing engine from among various open-source projects. Flink was deployed in large-scale production for the first time in 2016.
In 2018, Alibaba organized the first Flink Forward Asia conference in Beijing to promote Flink's adoption in China. The following year, Alibaba made a significant investment, acquiring dataArtisans and further solidifying its commitment to Apache Flink. This included contributions of their own production-proven version of Flink, named Blink, which encompassed approximately 1.5 million lines of code — a vital contribution that positioned Apache Flink for global production readiness.
By 2023, Apache Flink received the annual system award from SIGMOD, recognizing its value both in industry and academia.
Over the last decade, Flink has been widely adopted across various industries worldwide. It originated in Europe and expanded to China, the United States, and other regions, accumulating nearly 2000 contributors globally. Notably, about half of these contributors reside in China, largely attributable to the country's sizable population and Alibaba's substantial influence since 2018.
The Flink Forward Asia conference has taken place seven times, consistently promoting Flink's capabilities even during the pandemic. This year's event is the first hosted outside China, hinting at a future filled with more Flink-focused conferences across Asia.
The recognition of Apache Flink amidst developers is primarily due to its alignment with the growing demand for real-time data analytics. As we know, most decision-makers in enterprises hope their business reports can be updated in real-time within seconds, rather than on a T+1 basis. This real-time reporting would help them make decisions more timely and efficiently, while fintech companies require rapid risk detection to prevent losses from delays. E-commerce businesses also rely on real-time interactions to provide relevant product recommendations.
As a streaming data processing engine, Apache Flink has effectively addressed these challenges, demonstrating its market fit over the last decade.
Apache Flink has established itself as the de facto standard for real-time streaming computing across numerous industries. Its comprehensive capabilities allow users to process both bounded and unbounded data seamlessly, functioning as a unified engine for streaming and batch data.
Flink's rich set of connectors enables easy integration with the existing big data ecosystem, facilitating connections with mainstream databases, data lakes, data warehouses, messaging queues, and search engines. This adaptability positions Apache Flink as a backbone within modern data architecture.
A significant development in recent years is the shift from streaming compute to streaming lakehouse architectures. Lakehouse architecture has gained attention as a next-generation data structure, combining the best of data lakes and data warehouses.
In 2022, the Apache Flink community initiated the Flink Table Store project, centered around the real-time data lake format. This framework later evolved into Apache Paimon, an independent project designed for a streaming-oriented lakehouse.
With the new streaming lakehouse approach, developers can utilize Flink CDC to ingest data from external sources in real-time and leverage Flink SQL for both streaming and batch data processing in a unified manner. This architecture simplifies data pipelines, allowing for real-time integration without excessive data copying across systems.
This new type of technology, featuring a real-time architecture, has been successfully implemented at Alibaba, particularly during this year’s Double 11 shopping festival. By leveraging this innovative real-time data analytics architecture, Alibaba's Business Intelligence (BI) team can utilize a single unified SQL to build both stream and batch pipelines simultaneously. Users only need to write the SQL once to define both the real-time and offline business pipelines. Furthermore, they can use a single data storage system to manage all the data without needing to copy it from various data systems. For instance, users no longer need to transfer data from Kafka to Hive. This approach significantly reduces the complexity of the data architecture while also saving costs, all while ensuring real-time capabilities.
The future of Apache Flink looks promising, with the anticipated release of Flink 2.0 to mark the next generation in early 2025.
Several key advancements are in store:
In conclusion, as we celebrate a decade of progress with Apache Flink, I am optimistic about the engagement from local developers in Indonesia and Southeast Asia, contributing to the growth of the open-source community. Let’s look forward to the innovations Apache Flink 2.0 will bring to the data landscape!
Thank you for your time, and I hope to see increased participation in the Flink community from the region moving forward.
151 posts | 43 followers
FollowApache Flink Community China - January 11, 2021
Apache Flink Community China - September 27, 2020
Alibaba EMR - April 2, 2021
zjffdu - October 24, 2019
Apache Flink Community China - September 27, 2020
Apache Flink Community China - January 20, 2021
151 posts | 43 followers
FollowRealtime Compute for Apache Flink offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.
Learn MoreAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreSecure and easy solutions for moving you workloads to the cloud
Learn MoreMore Posts by Apache Flink Community