Wafer: AI that makes AI fast

Wafer

AI that makes AI fast

Summer 2025

Active

https://www.wafer.ai

AI that makes AI fast

Wafer builds AI agents that work as autonomous performance engineers, optimizing GPU kernels for AI inference. Our customers are chip companies and cloud providers who need their AI models running at peak performance on any type of hardware. Our founding team includes engineers from Google (Spanner, Gemini), Two Sigma, AWS, and Argonne National Lab, with NeurIPS publications in ML.

Active Founders

Emilio Andere

Founder

prev @ argonne, uchicago sand lab, elicit. math at uchicago.

Emilio Andere

Founder

prev @ argonne, uchicago sand lab, elicit. math at uchicago.

Steven Arellano

Founder

prev @ two sigma, google, sei labs, and axlab. cs + econ at uchicago.

Steven Arellano

Founder

prev @ two sigma, google, sei labs, and axlab. cs + econ at uchicago.

Hi YC! We're Emilio and Steven, founders of Wafer (S25)

Today we're launching Wafer Pass, a flat-rate API subscription to the fastest optimized open-source LLMs. One key, every model, drops into Claude Code, OpenClaw, Cline, Kilo Code, Roo Code, OpenHands, or Conductor.

The problem

If you use AI coding agents, you've felt the pain. Usage caps that hit at the worst moment. Per-token bills that swing between $5 and $50 a session. Juggling API keys across providers. Watching frontier closed models throttle you on Friday afternoons.

Open-source models (Qwen3.5-397B, GLM5.1, DeepSeek V4 Pro) have closed most of the gap on coding tasks. The problem is getting them served at the speed you actually want for an agent loop. Most providers run them on stock SGLang or vLLM, which leaves 50–80% of GPU performance on the table.

What we built

Wafer's core product is an AI performance engineer. You point it at an LLM and it optimizes it.

We have been pointing Wafer at the leading open-source LLMs and let it rewrite the entire serving stack: kernels, batching, scheduling, memory layout. The results are really impressive so far:

Qwen3.5-397B-Turbo is 2.8x faster than base SGLang
GLM5.1-Turbo is 2x faster than vLLM baseline
DeepSeek-V4-Pro is 2x faster than vLLM baseline

The Qwen example:

uploaded image

Same weights as anywhere else. Just faster, because Wafer rewrote the kernels. We put the result behind a single OpenAI- and Anthropic-compatible API.

Wafer Pass

One subscription. One API key. Every model we host.

$10/week (Starter): 1,000 requests every 5 hours. All models.
$25/week (Privacy): 2,000 requests every 5 hours. All models. Zero Data Retention.

Every plan includes every model, and every future Turbo model we ship. No price increase as the catalog grows. For context: that's 10x requests than Claude Code Max, at 1/2 the price.