
TL;DR: Saffron gives candidates a real codebase and ability to use any AI tooling they'd use on the job. We capture every prompt, diff, and decision in a sandboxed environment that gets scored automatically. Teams can now get structured metrics on how candidates actually build, not just whether the final output compiles.
Book an onboarding call here: https://trysaffron.ai
Our Launch Video
The Problem
AI coding tools have finally gotten good enough to actually change how engineers work. But the way companies evaluate SWEs hasn't changed at all.
Now, interviews are testing the wrong thing and candidates are passing every round of your hiring process, before showing up on the job unable to build a single feature without AI doing the thinking for them.
On the flip side, companies are missing out on engineers who actually know how to use AI to be ten times more efficient.
We know because we've been on both sides. As candidates, we realized how how much potential there was to become 10x with AI. As founders, we’ve heard the same story from dozens of teams: "They crushed the interview. Two weeks in, they couldn't ship anything independently."
Our Approach
We’ve built an assessment that embraces AI and measures how efficient an engineer is. Candidates get a real codebase, AI tools they’d actually use on the job, and a task that simulates engineering work. We capture the full process and evaluate it automatically.
⚙️ How It Works
Step 1: Create an assessment in under 5 minutes. Point us at your GitHub repo or choose a question from our bank, configure AI agents, and invite candidates.
Step 2: Candidates build in a fully managed environment. They get an online IDE that they can customize, a timer, and a real task. No setup required on their end. After coding, they answer debrief questions about their own implementation — and we verify their answers.
Step 3: Review results without reading a single line of code. AI review agents score the submission across dimensions you configure with specific evidence from the codebase. We show you detailed metrics like prompts, AI reliance percentage, or tool calling, and give a deterministic score on how well the candidate did.
Why Us
We’re Robert, Jerry, and Kazuma. We met in high school, started building together, and have been close friends ever since. Robert studied at MIT and was at Jane Street. Jerry studied at Stanford and was also at Jane Street. Kazuma studied at Harvey Mudd and was published in NeurIPS and ICML.
All three of us were CS students recruiting for SWE jobs the entirety of our first semester of college. And after noticing how much our coding workflow had changed with the introduction of better and better models, we knew that assessments had to catch up.
Our Ask
Can't wait to hear your feedback!
-Robert, Jerry, and Kazuma