
TLDR
- Google launched two new Gemini API inference tiers: Flex and Priority
- Flex offers a 50% discount for latency-tolerant, background workloads
- Priority runs 75–100% above standard pricing for high-reliability, real-time tasks
- Batch API also offers 50% off but with up to 24-hour latency
- Caching tier pricing is based on token count and storage duration
Get live prices, charts, and KO Scores from KnockoutStocks.com, the data-driven platform ranking every stock by quality and breakout potential.
Google updated its Gemini API pricing on April 2, offering developers five distinct service tiers: Standard, Flex, Priority, Batch, and Caching. The move gives developers more control over how they balance cost, speed, and reliability depending on what they’re building.
Balance cost & reliability with our new Flex & Priority inference tiers in the Gemini API!
Flex: Pay 50% less for cost-sensitive & latency-tolerant workloads
Priority: Highest reliability for your most critical, interactive apps (with premium pricing)Together with the async… pic.twitter.com/dCCTZsQydX
— Google AI Developers (@googleaidevs) April 2, 2026
The new Flex tier is designed for background tasks that don’t need a fast response. It costs 50% less than the standard rate by using off-peak compute capacity. Latency can range from 1 to 15 minutes, and it’s not guaranteed. Use cases include CRM updates, research simulations, and agentic workflows.
Unlike the existing Batch API, Flex uses synchronous endpoints. That means developers don’t have to manage input/output files or poll for job completion. It’s a simpler interface for the same cost savings.
Alphabet Inc., GOOGL
The Priority tier sits at the other end of the spectrum. It costs 75% to 100% more than the standard rate and is built for real-time, business-critical tasks. Response times run from milliseconds to seconds.
Google recommends Priority for live customer support bots, fraud detection, and content moderation pipelines. If a user’s Priority traffic exceeds set limits, overflow requests automatically drop to Standard tier rather than failing outright.

The Full Tier Breakdown
The existing Batch API remains available at 50% off standard pricing, with a latency window of up to 24 hours. It’s suited for heavy offline processing where speed isn’t a factor.
The Caching tier is priced based on token count and how long content is stored. Google suggests it for chatbots with lengthy system instructions, repeated analysis of large video files, or queries against big document sets.
Both Flex and Priority tiers use the same service_tier parameter in API requests. Developers can toggle between tiers with a single config change, and the API response will confirm which tier actually handled the request.
Flex is available to all paid tier users across GenerateContent and Interactions API requests. Priority is limited to Tier 2 and Tier 3 paid projects on the same endpoints.
What Developers Get
The unified interface is the headline feature here. Before this update, handling both background and interactive workloads required splitting architecture between synchronous and asynchronous systems. Now both can run through the same synchronous endpoints.
Google framed the update as part of its broader push to support AI agents, which often need to handle both low-urgency background processing and time-sensitive interactive tasks at the same time.
The announcement was made by Gemini API product manager Lucia Loher and engineering lead Hussein Hassan Harrirou on April 2, 2026.
Considering a new stock? You may want to see what’s on our watchlist first.
Our team at Knockout Stocks follows top-performing analysts and market-moving trends to spot potential winners early. We’ve identified five stocks gaining quiet attention that could be worth watching now. Create your free account to unlock the full report and get ongoing stock insights.
