{"id":93136,"title":"ZeroEval - Build self-improving agents","tagline":"A tool to evaluate and optimize AI agents using human feedback.","body":"![uploaded image](/media/?type=post\u0026id=93136\u0026key=user_uploads/208921/328fd273-24d5-4307-9322-075a26467fe9)\n\nHey everyone, we're Sebastian and Jonathan - founders of ZeroEval.\n\n### **TL;DR**\n\nZeroEval is a tool that helps you build reliable AI agents through evaluations that learn from their mistakes and get better over time.\n\nhttps://www.youtube.com/watch?v=hSkpdHE7mCs\n\n### **The problem**\n\nEvaluating complex AI systems is hard and time consuming. The more complex your agents get, the harder this issue becomes. This is especially the case when building:\n\n1. Long-running, multi-turn agents with dozens of intermediate tool calls \n2. Agents where you want to measure the quality of images, video, generated UI, audio, personality, taste, etc\n\n**Current offline eval methods are high-friction**, a lot of work is needed to continuously curate labeled data and write experiments and evaluators. \n\nOn the other hand, **current LLM judges are static and often have terrible performance**, they lack context on how they fail and the nuances of the task at hand. \n\nYour AI agents are as good as your evals. Without them, surpassing the quality threshold your product needs will feel like a never-ending task.\n\n### **What we’re building**\n\nA way to create **calibrated LLM judges** that get better over time the more production data they see and the more incorrect samples are labeled. The more you teach it on where it's failing, the more reliable it becomes.\n\nOnce you have a judge that matches the human preference baseline, you can continue using it on production data or in offline experiments.\n\n![uploaded image](/media/?type=post\u0026id=93136\u0026key=user_uploads/208921/f815b7cf-a84e-4285-974d-1ffc943889ad)\n\nWe’re also introducing **Autotune**, a way to do automatic evaluation on dozens of models and prompt optimization based on a few human samples.\n\n![uploaded image](/media/?type=post\u0026id=93136\u0026key=user_uploads/208921/954461b0-5b4a-42b9-b464-0a698c3f6fd0)\n\nWe envision a future where AI software improves based on human feedback, where developers define the evaluation criteria as a starting point and errors back propagate to find the optimal implementation.\n\n### **The team**\n\nWe met during their first year of college in Mexico over 7 years ago. During that time they worked on side projects together, joined a leading fintech startup as first engineers and most recently built [llm-stats.com](http://llm-stats.com), a leading LLM leaderboard website that reached 60k MAU and  ⅓  million unique users since its launch a few months ago.\n\n* **Sebastian** was founding engineer at Micro building the future of email (backed by a16z), as well as founding engineer at Atrato (YC W21).\n* **Jonathan** was an early employee on the LLM observability team at Datadog. He did undergrad research on vision transformers for particle physics and RL for robotics.\n\nFoundational models have transformed the world. We’re building the second line of offense to fill their capability gaps and create AI products that actually work. We are determined to build the engine behind self-improving software for the following decades. \n\n![uploaded image](/media/?type=post\u0026id=93136\u0026key=user_uploads/208921/b912b213-762c-42ba-8fe5-dd39de8dd748)\n\n### **Our ask**\n\nIf you have AI agents in production and are struggling to measure their quality and/or achieve the reliability needed for your product's success, we’d love to chat!\n\nWe don't just deliver a tool, but will sit with you to understand your pain points and help you build high quality evals.\n\nFeel free to reach out at [founders@zeroeval.com](mailto:founders@zeroeval.com) or [book a demo](https://cal.com/team/zeroeval/demo).","slug":"OEC-zeroeval-build-self-improving-agents","created_at":"2025-08-19T15:36:48.670Z","updated_at":"2026-04-18T18:55:24.544Z","total_vote_count":35,"url":"https://www.ycombinator.com/launches/OEC-zeroeval-build-self-improving-agents","share_image_url":"https://www.ycombinator.com/media/?type=post\u0026id=93136\u0026key=user_uploads/208921/328fd273-24d5-4307-9322-075a26467fe9","company":{"id":30625,"name":"ZeroEval","slug":"zeroeval","url":"https://zeroeval.com","logo":"https://bookface-images.s3.amazonaws.com/small_logos/d7ecfb758177e9198eb8221c46198150c3a277ba.png","batch":"Summer 2025","industry":"B2B","tags":["AIOps","Developer Tools","Generative AI","SaaS","AI"],"search_path":"https://bookface.ycombinator.com/company/30625"}}