HomeCompaniesJanus

Battle-Test AI Agents with Human Simulation

Janus tests conversational AI Agents for hallucinations, rule violations, tool call failures, and performance breakdowns pre-launch. We do this by simulating thousands of realistic AI users to eliminate manual testing and low-confidence in AI deployment. Our platform generates personalized datasets to evaluate, benchmark, and improve models over time.
Active Founders
Shivum Pandove
Shivum Pandove
Founder
Prev. ML & CS at Carnegie Mellon. Scaled 3+ startups as a SWE/PM from 0 -> 1, conducted DL research at comp. bio labs, and always building.
Jet Wu
Jet Wu
Founder
dropped out of ml at carnegie mellon. prev. worked on evals for microsoft's tinytroupe framework, was a cerebras systems ai fellow, did osint research at bellingcat.
Janus
Founded:2025
Batch:Spring 2025
Team Size:2
Status:
Active
Location:San Francisco
Primary Partner:Andrew Miklas
Company Launches
Janus – Simulation Testing for AI Agents
See original launch post

Hey Everyone! We’re Shivum and Jet, the co-founders of Janus! 👋🏼

TLDR; Janus battle-tests your AI agents to surface hallucinations, rule violations, and tool-call/performance failures. We run thousands of AI simulations against your chat/voice agents and offer custom evals for further model improvement.

Launch Video


💸 Why this matters

A single broken AI conversation can mean:

  • A PR disaster (Air Canada chatbot inventing refund policies)
  • Users churning after one bad reply
  • Lawsuits or regulatory fines for poor compliance

Yet most teams still test agents manually by pasting prompts into playgrounds.


🤕 The Problem

Manual QA covers maybe 100 scenarios, while real users trigger millions. Generic testing platforms don’t understand your customers and can’t simulate nuanced back‑and‑forths at scale. This leaves companies with no actionable insights and blind spots that only appear after you ship.


💡 Our Solution

Janus automatically:

  • Generates thousands of hyper‑realistic user personas—from angry customers to domain experts—to cover every possible edge case
  • Runs full multi‑turn conversations (text or voice) against your agent, APIs, and function calls
  • Allows you to input natural language rules on what to test your agent against and how you’d like it to perform
  • Detects hallucinations, bias, tool‑call failures, and risky responses using SOTA LLM‑as‑a‑Judge + black-box UQ techniques
  • Pinpoints root causes and produces actionable recommendations you can plug straight into CI/CD.

All in < 10 min.


📜 Backstory

Shivum and Jet left incoming roles at Anduril and IBM, dropped out of Carnegie Mellon ML, and moved to SF to build Janus full-time. We felt this pain first‑hand while building consumer-facing agents ourselves: every new model or prompt tweak broke something in prod. We built Janus to give ourselves the “crash‑test dummy” we wished existed from day-1.

🚀 Our Ask

Building or piloting an AI agent? Skip manual QA and get started in 15 minutes to see how Janus makes agent eval effortless: cal.com/team/janus/quick-chat.


Shivum & Jet (Founders of Janus)

Check us out at withjanus.com.

Email us at team@withjanus.com