×
Community Blog What's Coming in Apache Flink Agents 0.3

What's Coming in Apache Flink Agents 0.3

Version 0.3 aims to enhance capabilities with features like Agent Skills Integration, Mem0 based Long-Term Memory support and Durable Execution Reconciler.

Author:Wenjin Xie

If you've been keeping an eye on the Apache Flink ecosystem, you've probably noticed the buzz around Flink Agents. It's a brand-new sub-project aiming to provide an open-source framework for building event-driven streaming agents.

Recently, Flink-Agents released version 0.2.1 and showcased an intelligent operations Agent for Flink jobs built on Flink-Agents, demonstrating its potential in the event-driven agent domain. But the real excitement is happening right now: the community has kicked off discussions for version 0.3: Planning Flink Agents 0.3.

I've been digging through the github discussions, github issues, and recent PRs to get a clearer picture of what's next. I've put together a summary of the 0.3 roadmap to help fellow developers understand where things are heading and how we can jump in.

Roadmap

Based on the current community threads, the target feature freeze is set for May 31, 2026, with a release goal of June 15, 2026. While dates in open source can always shift, here are the key features currently in the pipeline:

  • Agent Skills Integration
  • Mem0 backend for Long-Term Memory
  • Per-Event-Type Configurable Log Levels for Event log
  • Arguments injection for tool calling.
  • Support cross-language actions & events
  • Quickstart experience enhancement
  • Optimize the display of the event log
  • Support async execution for cross-language resource
  • Durable execution enhancement
  • Support for python 3.12

Some of these aren't just ideas anymore; work has already started:

Why I'm Excited About These Features

There's a lot on this list, but a few specific updates really stand out to me from a developer perspective. Here's my take on why they matter.

Agent Skills Integration

Agent Skills have emerged as a lightweight, open format for extending AI agent capabilities with specialized knowledge and workflows. They're gaining traction quickly and have already been adopted by a growing number of agent products.

Take OpenClaw, for instance. It's taken the community by storm recently, and I bet many of you have already tried it. In my opinion, a huge part of OpenClaw's popularity boils down to its support for agent skills. On one hand, skills make workflows more stable and efficient, significantly boosting agent performance. On the other hand, users can easily grab pre-written skills from within their organizations or off the internet, making it incredibly simple to expand an agent's capabilities.

If you caught the recently released flink operation agent demo from Flink Agents, you might have noticed the underlying concept is strikingly similar to Agent Skills. It uses an LLM to generate a concise problem description, retrieves Standard Operating Procedures (SOPs) from a vector database based on that description, and then executes operations according to those SOPs. Essentially, this is no different from an LLM identifying relevant skills based on context and executing actions via those skills. However, compared to RAG, skill discovery is much more lightweight. Once Flink Agents 0.3 drops, I'd love to see interested developers refactor this demo to leverage native Agent Skills.

The community has already published a design proposal for integrating Agent Skills: https://github.com/apache/flink-agents/discussions/565 .From what I can see, the Implementation of progressive disclosure mechanism for Agent Skills is similar to other frameworks. The real distinction here is that Flink Agents is a distributed agent framework built on Flink. This raises an interesting engineering challenge: how do we effectively provide skills to Flink Agents jobs running in clusters like YARN or Kubernetes? It's definitely a topic worthy of deeper consideration.

Mem0 backend for Long-Term Memory

Long-term memory is a critical component of agent context management, especially for agents designed to run over extended periods. This aligns perfectly with Flink Agents' core target scenario: agents that operate 24/7, continuously consuming events. Back in version 0.2, Flink Agents already provided native support for Long-Term Memory, including a manually implemented, rudimentary automatic compression mechanism.

Full disclosure: I built that feature myself. During the implementation process, I quickly realized that managing long-term memory—particularly memory compression—is incredibly complex. Building a mature, user-friendly long-term memory solution from scratch within Flink Agents presents significant engineering challenges. Furthermore, whether for streaming agents or conversational agents, the fundamental requirements and usage patterns for long-term memory don't differ substantially. With this in mind, I investigated how other conversational agent frameworks and specialized memory management systems handle this, which ultimately led me to Mem0.

Mem0 is a popular intelligent memory layer specifically designed for AI agents. By supporting Mem0 as the backend for Flink Agents' Long-Term Memory, we can leverage existing open-source expertise to provide more specialized, robust memory capabilities without reinventing the wheel.

Durable execution enhancement

Built on Flink, one of the standout advantages of Flink Agents is naturally its fault tolerance. If you're familiar with Flink, you know it implements a checkpointing mechanism based on the Chandy-Lamport algorithm. This allows Flink Agents to recover from checkpoints without needing to re-consume data from the beginning.

But here's the catch: for agents, checkpoint recovery alone isn't enough. Since agents frequently invoke external models and perform actions, recovering from a checkpoint could still lead to re-processing events that occurred after the last checkpoint. This results in duplicate model invocations and action executions. LLM calls are costly, and repeated actions can have unintended side effects. Therefore, we've been continuously improving durable execution within Flink Agents:

  • Flink Agents 0.1 introduced per-action consistency. By leveraging an action store, it avoids the replay of already-executed actions during job recovery, limiting inconsistency to the scope of a single action.
  • Flink Agents 0.2 provides a durable execution interface. Within a single action, users can submit code snippets via this interface, and their return results are recorded. Consequently, upon job recovery, if a code snippet has already been fully executed, it doesn't need to run again. This narrows the inconsistency scope further, affecting only those snippets submitted through the durable execution interface.

You might notice an unresolved issue remains: If a code snippet has started executing but has not completed and returned results before recovery, it will be re-executed after the job is resumed. Since these snippets often involve interactions with external systems—such as calling LLMs or accessing vector databases—Flink Agents alone cannot ensure exactly-once consistency. This situation mirrors Flink sinks: Flink's checkpointing guarantees exactly-once semantics only within the system itself, while end-to-end exactly-once consistency relies on downstream systems supporting idempotency or two-phase commit protocols.

How will Flink Agents address this challenge? It remains an open question, but one possible approach is to introduce a api to provide hook or callback mechanism. This would empower users to customize processing logic based on their specific business scenarios. For instance, if an external service supports idempotent operations, users could configure to retry directly. Alternatively, they might choose to query the service status first before deciding whether to retry. By providing this flexibility, Flink Agents can better accommodate the diverse reliability requirements of real-world applications.

Event Log Enhancement

Observability is critical for any production-grade product. If you've ever troubleshooted online incidents in distributed systems, you know exactly why. For agent frameworks, observability is even more crucial due to the inherent uncertainty introduced by large language models (LLMs).

Flink Agents leverages events to orchestrate agents and supports the generation and display of event logs. Through these logs, users can gain detailed insights into an agent's execution process. From my own experience debugging agents built with Flink Agents, I can confirm that event logs are incredibly helpful. In the recently released Flink Operation Agent demo, you can see how event logs allow us to clearly verify an agent's behavior.

However, to make Flink Agents truly production-ready, I believe we need to continue improving the usability of these logs. I've noticed that Flink Agents 0.3 has planned several key enhancements:

  • Human-Readable Formats: Currently, the output readability isn't always user-friendly. Version 0.3 will support configurable output formats to make logs easier for humans to parse.
  • Configurable Log Levels: For complex agents, users often care only about specific events. In version 0.3, Flink Agents will introduce per-event-type configurable log levels, enabling users to flexibly set log levels according to their specific requirements.
  • Structured Querying: As agents run continuously, event logs accumulate rapidly. Supporting structured queries will help users locate the specific information they need more efficiently.

I am genuinely excited about the new features coming in version 0.3. This is not just about adding new capabilities; it's about organically integrating them to create a truly production-ready, distributed, event-driven agent framework.

Appendix

0 2 0
Share on

Apache Flink Community

206 posts | 57 followers

You may also like

Comments