×
Community Blog Alibaba Cloud AI Gateway FinOps Features Officially Launched | Making Every Token's Consumption "Visible and Controllable"

Alibaba Cloud AI Gateway FinOps Features Officially Launched | Making Every Token's Consumption "Visible and Controllable"

This article introduces Alibaba Cloud AI Gateway's new FinOps features for LLM cost governance and token quota management.

By Wenhao Zhang

As LLM API calls transition from "early trial" to "large-scale production," cost governance is no longer an optional question, but a must-answer one.

I. Why Does the AI Era Need FinOps?

As enterprise AI applications enter deep waters, more and more teams are starting to face a common challenge:

  • Who is using the models? How many Tokens are consumed?
  • Which business line is burning money? Which consumer is "over-running benchmarks"?
  • By the time the bill comes out at the end of the month and you find the budget has blown up, it's already too late.

The traditional "post-hoc reconciliation" model can no longer match the characteristics of LLMs, such as Token-based billing, high calling frequency, and severe cost fluctuations. FinOps (Cloud Financial Operations) was born precisely for this—bringing cost observability, allocability, and governability forward into the invocation link.

Alibaba Cloud AI Gateway officially launches its FinOps capabilities, starting with "Consumer Quota," allowing enterprises to be fully aware of every step of LLM invocations.

1
[Illustration: Overall Overview of the FinOps Primary Menu]

2. Capability Overview: FinOps Primary Category, Quota Governance is the First Stop

In this release, the AI Gateway instance has added a FinOps Primary Category and launched Consumer Quota as the first second-level category, centering on the two main lines of "Rule Definition" and "Usage Monitoring" to provide a complete closed loop of quota governance.

Simply put, you can think of it as giving each "model caller" a quota card:

  • How many Tokens can be used in the card is up to you;
  • Where they are used, how much is used, and how much is left can be seen at a glance.

2
[Illustration: Consumer Quota Feature Entry]

The figure below shows the overall system architecture. When a consumer's request passes through the AI Gateway, the gateway performs four core capabilities: identity authentication, quota management, rate limiting, and cost measurement, and provides two visual modules through the FinOps dashboard: quota rule management and usage monitoring:

3
[Illustration: System Architecture Diagram]

III. Consumer Quotas: Rule Management

3.1 Flexible Quota Rule Definition

On the "Quota Rules" page, you can quickly create a Token quota rule for different consumers. The core fields are clear at a glance:

Field Description
Rule Name Custom naming for easy subsequent search and management
Limit Type Token Quota
Consumer Selection Select consumers to bind with the quota rule
Quota Type Natural Cycle Quota
Time Zone Selection Supports multiple time zones, allowing cross-regional teams to align precisely
Cycle Reset Every natural day / every natural week / every natural month, Beijing time zone by default

4
[Illustration: Quota Rule Creation Form Page]

3.2 Full Lifecycle Rule Status Management

Each rule is not a "one-off," but can be dynamically adjusted according to the business rhythm:

  • Rule Status: Enabled / Disabled, status switching takes effect in real-time;
  • Operational Capabilities: Edit rules, reset quotas, enable/disable, and delete, covering the entire lifecycle of rules.

If you need to temporarily add quota to a certain rule dimension, you can adjust the quota size by editing the quota, supporting one-click quota reset.

5
[Illustration: Quota Reset Page]

IV. Checking Consumer Usage and Cost: Making Every Cent Traceable

Having rules alone is not enough; the other half of FinOps' soul is observability. The AI Gateway provides multi-dimensional statistical capabilities in the "Consumer Usage" module:

Supports Dimension Switching

  • View by Consumer: Supports switching consumers, precise to the usage profile of a single caller;

Full Coverage of Core Indicators

  • Quota usage for the current cycle / fixed time period: Total Used Tokens, Remaining Tokens;
  • Token dimension statistics: Input Tokens, Output Tokens, Cached Tokens, Total Tokens.

V. Final Words: From "Affordable" to "Efficient Use"

In essence, cost governance of LLMs is the final hurdle for the implementation of an enterprise's AI strategy. The FinOps capability of Alibaba Cloud AI Gateway is precisely to turn "cost," which was originally a delayed and vague metric, into a real-time, clear, and actionable engineering capability.

So that every AI call is "clearly spent and comfortably used."

Try it now: Log in to the Alibaba Cloud AI Gateway Console and enter "FinOps - Consumer Quota" to start configuring your first quota rule.

https://apig.console.alibabacloud.com/#/ai-gateway-overview


Follow us to get the latest feature updates of Alibaba Cloud AI Gateway.

0 1 0
Share on

You may also like

Comments

Related Products

  • API Gateway

    API Gateway provides you with high-performance and high-availability API hosting services to deploy and release your APIs on Alibaba Cloud products.

    Learn More
  • Alibaba Cloud Model Studio

    A one-stop generative AI platform to build intelligent applications that understand your business, based on Qwen model series such as Qwen-Max and other popular models

    Learn More
  • AgentBay

    Multimodal cloud-based operating environment and expert agent platform, supporting automation and remote control across browsers, desktops, mobile devices, and code.

    Learn More
  • Qwen

    Full-range, open-source, multimodal, and multi-functional

    Learn More