
TL;DR - Wafer Pass is a single subscription that gives you the fastest optimized open-source LLMs for any agentic coding harness. Plans start at $10/week. Every plan includes every model we host. Sign up at wafer.ai/pass.
Today we're launching Wafer Pass, a flat-rate API subscription to the fastest optimized open-source LLMs. One key, every model, drops into Claude Code, OpenClaw, Cline, Kilo Code, Roo Code, OpenHands, or Conductor.
If you use AI coding agents, you've felt the pain. Usage caps that hit at the worst moment. Per-token bills that swing between $5 and $50 a session. Juggling API keys across providers. Watching frontier closed models throttle you on Friday afternoons.
Open-source models (Qwen3.5-397B, GLM5.1, DeepSeek V4 Pro) have closed most of the gap on coding tasks. The problem is getting them served at the speed you actually want for an agent loop. Most providers run them on stock SGLang or vLLM, which leaves 50–80% of GPU performance on the table.
Wafer's core product is an AI performance engineer. You point it at an LLM and it optimizes it.
We have been pointing Wafer at the leading open-source LLMs and let it rewrite the entire serving stack: kernels, batching, scheduling, memory layout. The results are really impressive so far:
The Qwen example:
Same weights as anywhere else. Just faster, because Wafer rewrote the kernels. We put the result behind a single OpenAI- and Anthropic-compatible API.
One subscription. One API key. Every model we host.
Every plan includes every model, and every future Turbo model we ship. No price increase as the catalog grows. For context: that's 10x requests than Claude Code Max, at 1/2 the price.
Sign up: wafer.ai/pass
If you use Claude Code, OpenClaw, or any other agent harness, we'd love your feedback. Drop your Wafer API key in and tell us what breaks.
If you're building an agent or coding tool and want a default model layer, let's talk.
Email: emilio@wafer.ai · Book a 15-min call: cal.com/wafer/quick-chat