{"id":90835,"title":"Janus – Simulation Testing for AI Agents","tagline":"Battle-Test your AI Agents with Human Simulation","body":"Hey Everyone! We’re Shivum and Jet, the co-founders of Janus! 👋🏼\n\n**TLDR;** Janus **battle-tests** **your AI agents** to surface hallucinations, rule violations, and tool-call/performance failures. We run **thousands of AI simulations** against your chat/voice agents and offer **custom evals** for further model improvement.\n\n[Launch Video](https://www.youtube.com/watch?v=vfaxbDH_N78\u0026feature=youtu.be)\n\n---\n\n**💸 Why this matters**\n\nA single broken AI conversation can mean:\n\n* A PR disaster (Air Canada chatbot inventing refund policies)\n* Users churning after one bad reply\n* Lawsuits or regulatory fines for poor compliance\n\nYet most teams still test agents manually by pasting prompts into playgrounds.\n\n---\n\n**🤕 The Problem**\n\nManual QA covers maybe **100** scenarios, while real users **trigger** **millions**. Generic testing platforms don’t understand your customers and can’t simulate nuanced back‑and‑forths at scale. This leaves companies with **no actionable insights** and blind spots that **only appear after you ship.**\n\n---\n\n**💡 Our Solution**\n\nJanus automatically:\n\n* Generates **thousands of hyper‑realistic user personas**—from angry customers to domain experts—to cover every possible edge case\n* Runs **full multi‑turn conversations** (text or voice) against your agent, APIs, and function calls \n* Allows you to input **natural language rules** on what to **test your agent** against and **how you’d like it to perform**\n* Detects hallucinations, bias, tool‑call failures, and risky responses using **SOTA LLM‑as‑a‑Judge** + **black-box** **UQ** techniques\n* Pinpoints root causes and produces **actionable recommendations** you can plug straight into CI/CD.\n\nAll in **\u0026lt; 10 min**.\n\n---\n\n![uploaded image](/media/?type=post\u0026id=90835\u0026key=user_uploads/1090038/8d1055ac-9edd-453e-a819-0eb75919843d)\n\n**📜 Backstory**\n\nShivum and Jet left incoming roles at Anduril and IBM, dropped out of Carnegie Mellon ML, and moved to SF to build Janus full-time. We felt this pain first‑hand while building consumer-facing agents ourselves: every new model or prompt tweak broke something in prod. We built Janus to give ourselves the “crash‑test dummy” we wished existed from day-1.\n\n**🚀 Our Ask**\n\nBuilding or piloting an AI agent? Skip manual QA and get started in 15 minutes to see how Janus makes agent eval effortless:[ ](https://cal.com/team/janus/quick-chat)[cal.com/team/janus/quick-chat](http://cal.com/team/janus/quick-chat). \n\n\\\n— _Shivum \u0026 Jet_ (Founders of Janus)\n\nCheck us out at[ ](https://withjanus.com)[withjanus.com](https://www.withjanus.com/).\n\nEmail us at [team@withjanus.com](mailto:team@withjanus.com) ","slug":"Nd5-janus-simulation-testing-for-ai-agents","created_at":"2025-05-28T23:10:47.076Z","updated_at":"2026-05-05T18:10:51.996Z","total_vote_count":86,"url":"https://www.ycombinator.com/launches/Nd5-janus-simulation-testing-for-ai-agents","share_image_url":"https://www.ycombinator.com/media/?type=post\u0026id=90835\u0026key=user_uploads/1090038/8d1055ac-9edd-453e-a819-0eb75919843d","company":{"id":30521,"name":"Janus","slug":"janus","url":"https://www.withjanus.com/","logo":"https://bookface-images.s3.amazonaws.com/small_logos/3b44f7b050ecc3ee1451baeb4e0a733b19937955.png","batch":"Spring 2025","industry":"B2B","tags":["AIOps","Developer Tools","Reinforcement Learning","Monitoring","AI"],"search_path":"https://bookface.ycombinator.com/company/30521"}}