Cloudflare unified AI Gateway and Workers AI into a single inference API with multi-provider routing, failover, and cost controls built for agentic workloads.
Cloudflare announced a unified inference platform combining AI Gateway and Workers AI into one API that routes to any model from any provider. New features include automatic retries on upstream failures, granular cost monitoring across providers, and a refreshed dashboard with zero-setup default gateways. They also acquired Replicate's model catalog, migrating hosted models onto Workers AI infrastructure and making them accessible via AI Gateway. The platform is purpose-built to handle chained multi-model calls in agent workflows, addressing cascading failure and latency compounding at scale.
Chained inference calls in agents compound latency and failure rates multiplicatively — a 10-call agent with a flaky provider doesn't fail once, it fails in sequence. Cloudflare's unified layer adds automatic retries, cross-provider routing, and edge-distributed inference under a single API endpoint, eliminating the glue code most teams maintain manually. The Replicate acquisition also drops hundreds of open-source models into the same interface without a separate SDK.
Swap your current direct OpenAI or Anthropic calls to route through the AI Gateway universal endpoint this week — benchmark the p95 latency difference on your highest-volume agent chain and validate the automatic retry behavior by simulating a provider timeout.
Navigate to developers.cloudflare.com/ai-gateway and grab your Gateway URL from the dashboard under 'Default Gateway'
Tags
Also today
Signals by role
Also today
Tools mentioned