Homeβ€ΊCompaniesβ€ΊIncidentFox
IncidentFox

AI SRE agent that triages, coordinates, and fixes production incidents

AI SRE agents that automatically learn each customer’s system so they work just like an in-house engineer.
Active Founders
Jimmy Wei
Jimmy Wei
Founder
Ex-SWE @Roblox, Building AI SRE
Long Yi
Long Yi
Founder
Ex-SRE @Roblox, Building AI SRE
Company Launches
IncidentFox. The AI SRE who never drops context.
See original launch post

Hey everyone πŸ‘‹

We're Long and Jimmy β€” co-founders of IncidentFox.

TL;DR

AI SRE tools fail without deep context about your systems β€” and that context lives in integrations nobody has time to build. IncidentFox auto-discovers what each team needs, generates the integrations, and ships with 300+ tools built in. Setup takes less than a day, not months.

πŸ‘‰ Try it: https://incidentfox.ai

πŸ‘‰ Open source (Apache 2.0): https://github.com/incidentfox/incidentfox

πŸ‘‰ Launch video: https://youtu.be/TaTpN0JwNYE

❌ The Problem: Without the right integrations, your AI has no context

Every AI SRE tool connects to Slack, reads your Confluence, queries your Datadog. That part is solved.

Here's what isn't: when the AI actually needs to debug something, it doesn't have the right tools.

Your payments team runs a custom Kafka pipeline with internal dashboards. Your infra team uses a homegrown deployment system. Your ML team has proprietary model serving. Each team's stack is different β€” and the AI has no way to query any of it.

The traditional fix? Hand-build integrations (MCP servers) for every team.

But this creates a new problem:

  • Who decides what integrations each team needs? You might know your own team's stack β€” but when you're debugging another team's service at 3 AM, you don't know theirs.
  • So every team needs to sacrifice an engineer to build and maintain their own integrations. That's expensive, slow, and doesn't scale.
  • Most teams never get around to it. The AI stays half-useful.

Integration is the bottleneck. Not the AI model. Not the monitoring data. The integrations.

🦊 IncidentFox: We auto-build the integrations for you

IncidentFox is an AI SRE that lives where your team already works β€” Slack, Microsoft Teams, or Google Chat. It doesn't just connect to your existing tools β€” it figures out what tools each team needs and builds them automatically.

uploaded image

1. Auto-discovers and generates integrations

IncidentFox analyzes your codebase, infrastructure, and incident history to identify gaps β€” then auto-generates the tools to fill them. No engineer needs to hand-build MCP servers. No team needs to sacrifice headcount on integration work.

A new team onboards? IncidentFox studies their stack and proposes tools specific to their services β€” with human approval before anything goes live.

2. Per-team configuration β€” because every team is different

Your payments team and your ML team don't use the same stack. Why would they use the same AI SRE config?

Each team gets their own:

  • Tools β€” enable only what's relevant; disable what isn't
  • Prompts β€” fully open source and exposed to engineers. Inject your domain knowledge directly
  • Knowledge base β€” learned from that team's incidents, runbooks, and services

One team's "config drift" is another team's "model drift." IncidentFox understands the difference.

3. Continuously evaluates and self-improves

After every incident, IncidentFox:

  • Detects gaps β€” "I couldn't query service X's health endpoint"
  • Auto-generates the missing tool
  • Evaluates its own investigation quality against the actual resolution
  • Updates prompts and knowledge β€” with human review

It gets measurably better every week. Not because you tuned it β€” because it tuned itself.

4. 300+ integrations included on day one

While it auto-builds what's missing, you're not starting from zero. Kubernetes, AWS, Grafana, Prometheus, Datadog, Elasticsearch, PagerDuty, GitHub β€” all built in. Integration time is under a day, not months.

🧠 Why this matters

Integrations β€” Traditional: Hand-built per team. Each team sacrifices an engineer. IncidentFox: Auto-discovered and auto-generated. Human approves.

Multi-team scaling β€” Traditional: Breaks β€” you can't know every team's stack. IncidentFox: Per-team config. Each team's AI knows their stack.

Domain knowledge β€” Traditional: Black box prompts, hope it works. IncidentFox: Open source prompts. Engineers inject and edit freely.

Over time β€” Traditional: Stagnates unless manually updated. IncidentFox: Self-evaluates, finds gaps, improves continuously.

Setup β€” Traditional: Months of custom integration work. IncidentFox: < 1 day. 300+ tools out of the box.

πŸ“Š Results

  • 85–95% reduction in alert noise through intelligent correlation
  • Hours β†’ minutes for incident investigation
  • Zero-config onboarding β€” Docker in 5 min, production K8s in 30 min

uploaded image

πŸ”’ Enterprise-ready, open-source

  • Open source β€” Apache 2.0, no vendor lock-in
  • SOC 2 compliant, SSO/OIDC, RBAC, audit logs
  • Self-hosted, on-prem, or managed SaaS
  • Bring your own LLM keys (OpenAI, Claude, Gemini, etc.)

πŸ‘¬ The Team

Jimmy (CEO) β€” Previously at Roblox, where he built social communication features (in-experience calling for 100M+ DAU). Before that, worked at Meta FAIR on multiparty conversational AI, with published research. Cornell CS. Serial founder β€” previously CTO at a startup in Outlier Ventures' DeFi accelerator.

Long (CTO) β€” Previously at Roblox, where he built database infrastructure supporting 100M+ daily active users on the Stateful Infra team. Experienced the chaos of on-call firsthand β€” which is why we're building this. Brandeis CS + Neuroscience + Business.

We've lived on both sides: Jimmy built the AI systems, Long was the SRE drowning in incidents. IncidentFox is what happens when you combine both.

uploaded image

πŸ™ Our Asks

Thanks for reading ❀️

YC Photos
IncidentFox
Founded:2025
Batch:Winter 2026
Team Size:2
Status:
Active
Location:San Francisco
Primary Partner:Diana Hu