HomeCompaniesWafer
Wafer

AI that makes AI fast

Wafer builds AI agents that work as autonomous performance engineers, optimizing GPU kernels for AI inference. Our customers are chip companies and cloud providers who need their AI models running at peak performance on any type of hardware. Our founding team includes engineers from Google (Spanner, Gemini), Two Sigma, AWS, and Argonne National Lab, with NeurIPS publications in ML.
Active Founders
Emilio Andere
Emilio Andere
Founder
prev @ argonne, uchicago sand lab, elicit. math at uchicago.
Steven Arellano
Steven Arellano
Founder
prev @ two sigma, google, sei labs, and axlab. cs + econ at uchicago.
Company Launches
Wafer Pass: flat-rate access to the fastest open-source LLMs
See original launch post

TL;DR - Wafer Pass is a single subscription that gives you the fastest optimized open-source LLMs for any agentic coding harness. Plans start at $10/week. Every plan includes every model we host. Sign up at wafer.ai/pass.

uploaded image

Hi YC! We're Emilio and Steven, founders of Wafer (S25)

Today we're launching Wafer Pass, a flat-rate API subscription to the fastest optimized open-source LLMs. One key, every model, drops into Claude Code, OpenClaw, Cline, Kilo Code, Roo Code, OpenHands, or Conductor.

The problem

If you use AI coding agents, you've felt the pain. Usage caps that hit at the worst moment. Per-token bills that swing between $5 and $50 a session. Juggling API keys across providers. Watching frontier closed models throttle you on Friday afternoons.

Open-source models (Qwen3.5-397B, GLM5.1, DeepSeek V4 Pro) have closed most of the gap on coding tasks. The problem is getting them served at the speed you actually want for an agent loop. Most providers run them on stock SGLang or vLLM, which leaves 50–80% of GPU performance on the table.

What we built

Wafer's core product is an AI performance engineer. You point it at an LLM and it optimizes it.

We have been pointing Wafer at the leading open-source LLMs and let it rewrite the entire serving stack: kernels, batching, scheduling, memory layout. The results are really impressive so far:

  • Qwen3.5-397B-Turbo is 2.8x faster than base SGLang
  • GLM5.1-Turbo is 2x faster than vLLM baseline
  • DeepSeek-V4-Pro is 2x faster than vLLM baseline

The Qwen example:

uploaded image

Same weights as anywhere else. Just faster, because Wafer rewrote the kernels. We put the result behind a single OpenAI- and Anthropic-compatible API.

Wafer Pass

One subscription. One API key. Every model we host.

  • $10/week (Starter): 1,000 requests every 5 hours. All models.
  • $25/week (Privacy): 2,000 requests every 5 hours. All models. Zero Data Retention.

Every plan includes every model, and every future Turbo model we ship. No price increase as the catalog grows. For context: that's 10x requests than Claude Code Max, at 1/2 the price.

Try it

Sign up: wafer.ai/pass

If you use Claude Code, OpenClaw, or any other agent harness, we'd love your feedback. Drop your Wafer API key in and tell us what breaks.

If you're building an agent or coding tool and want a default model layer, let's talk.

Email: emilio@wafer.ai · Book a 15-min call: cal.com/wafer/quick-chat

Previous Launches
AI agent that turns your slow PyTorch into fast GPU code, automatically
Jobs at Wafer
San Francisco, CA, US
$150K - $250K
1.00% - 2.00%
Any (new grads ok)
San Francisco, CA, US
$6K - $10K / monthly
Any
San Francisco, CA, US
$6K - $10K / monthly
Any
Wafer
Founded:2025
Batch:Summer 2025
Team Size:5
Status:
Active
Location:San Francisco
Primary Partner:Jared Friedman