HomeCompaniesBenchify

Instantly Repair LLM-Generated Code

Benchify handles the middle mile of codegen ensuring that generated code just works and is instantly executable. It’s a one-line SDK call between LLM clients and sandboxes to deliver instant code repair, accelerated bundling, and observability.
Active Founders
Juan Castaño
Juan Castaño
Founder
Juan Castaño is the co-founder and CEO of Benchify. Prior to Benchify, Juan bought and sold a business for 5x MOIC, created MVPs for 9+ clients as a UI/UX designer, devised GTM strategy at Instawork (Series C), initiated a growth product team at Klaviyo (pre-IPO), and advised on due diligence transactions at McKinsey & Company. Juan studied Economics and Human-centered design at Dartmouth and holds an MBA from MIT Sloan.
Max von Hippel
Max von Hippel
Founder
Max von Hippel holds a PhD in Computer Science from Northeastern University, focused on security and interactive theorem proving. Before that, he completed a BS in Pure Mathematics at the University of Arizona. But most importantly, he holds a Blue Belt from 10th Planet Jiu Jitsu. Max grew up in Anchorage Alaska and is deeply passionate about software assurance (who isn't?).
Company Launches
Benchify: Instant self-healing codegen
See original launch post

TL;DR

Benchify handles the middle mile of codegen ensuring that generated code just works and is instantly executable. It’s a one-line SDK call between LLM clients and sandboxes to deliver instant code repair, accelerated bundling, and observability.

Who

Anyone depending on non-human-in-the-loop codegen: – app builders, dynamic websites, agents, etc.

Problem

Generated code breaks — constantly.

On top of normal bugs like duplicate function calls, parse errors, or missing or extra parens, AI systems introduce new ones, such as stray tool calls, /* rest of code goes here */, and malformed diff applications. Running that code inside sandboxes only compounds the pain: every piece has to be perfect for execution to succeed, and the sandbox boot time delays the inevitable. Since all the sandboxes are just firecracker VMs designed to run anything, they’re not optimized for the common workflows builders actually care about. The result is slow setup, fragile execution, and painful feedback loops.

  • Missed errors: Users hit error screens as soon as code loads in the browser.
  • Delayed generations: LLM-based auto-healing stretches generation times, slowing iteration.
    Token burn: Edge-case failures trigger endless retries that chew through tokens without progress.  Often bugs are inherently out-of-distribution and thus hard to fix with AI.
  • Setup lag: Sandboxes take 30-120s to boot before code even runs, adding cold-boot lag to the rep-cycle when there’s an error.
  • Lost data: Sandboxes don’t easily track errors and the errors or only show the first (breaking) bug, making it hard to detect all the errors in code at once to push improvements
  • Templates: The use of templates speeds up sandboxes but leads to even more brittleness as the template has to be in perfect lockstep with the codegen.

Solution

Benchify combines non-AI techniques (static analysis + program synthesis) with highly optimized infrastructure to deliver turn-key code — fixed and bundled — in O(1 second).

It drops in as a one-line SDK call between your LLM client and the sandbox. If you’re only doing front-end work, you can skip the sandbox entirely and render directly from Benchify’s bundled output.

uploaded image

Code Repair: Sub second fixes for parsing, dependency, CSS/Tailwind, type, and interaction errors (e.g. empty-Select) with more on the way. If there’s an issue you’re running into, let us know and we can add a fix!

uploaded image

Bundling: Build and dependency resolution in 1-3s.

  • Front-end: Returns code that instantly renders on client via our SDK.
  • Full-stack: Bundles code that executes in any sandbox (skipping slow dependency & build steps) — the only delay left is sandbox cold boot.

uploaded image

Observability: Analytics on error patterns in generated code.

Product Demo

https://youtu.be/my7yzpp8AqY

How it works

Benchify’s analysis engine detects bugs and dispatches them to a growing library of static repair strategies in a fraction of a second. Strategies are optimized for different bug types, and layered using an incremental parsing approach, since sometimes fixing one bug unlocks others.  Each candidate fix is re-analyzed, with the best one selected automatically provided it yields a strict improvement in the code.  The architecture builds on prior research in program synthesis and program repair, where the idea is to have a collection of strategies that may or may not work at fixing different bug types, combined with an analysis and execution engine that can efficiently determine whether or not a strategy succeeded.

Story

We entered YC with a formal-methods-driven code review product. But unreliable LLM-generated test harnesses kept breaking it. Talking with builders made it clear: the real bottleneck was brittle codegen itself. We pivoted to focus entirely on making generated code self-healing.

Ask

We’re focused on app builders today, but our core tech generalizes: agents, self-updating sites, programmatic ads, and more.

If generated code is slowing you down, let’s talk.

Previous Launches
Better testing without writing tests.
YC Photos
Benchify
Founded:2024
Batch:Summer 2024
Team Size:4
Status:
Active
Location:San Francisco
Primary Partner:Nicolas Dessaigne