HomeCompaniesPolymath

Applied Intuition for AI agents

We’re heading towards a future where AI agents will be able to perform useful work over long horizons, with little or no human supervision. To increase the reliability, performance, and safety of autonomous agents, they must be trained in simulation environments that reflect the real world. Polymath builds simulated worlds for agents to practice and learn through experience. We're a team of researchers and engineers from UC Berkeley, Hume AI, Plaid, and Amazon. We have years of experience post-training frontier models in industry, and building large scale data systems. Polymath is backed by Y Combinator.
Active Founders
Dylan Ma
Dylan Ma
Founder
Co-Founder / CEO @ Polymath. Previously @ Hume AI, AWS, UC Berkeley
Naren Yenuganti
Naren Yenuganti
Founder
Co-Founder / CTO @ Polymath. Previously @ Plaid, Amazon, UC Berkeley
Company Launches
Polymath: Applied Intuition for AI agents
See original launch post

Problem

We’re at the very beginning of the agent era. The demand for AI is shifting from models that simply answer questions to agents that can operate autonomously over days and weeks. Models have become incredibly proficient at short tasks, but fail when asked to perform long-horizon work that requires proficiency with a diverse set of tools.

uploaded image

(source: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/)

Static datasets are no longer sufficient. To improve the performance of autonomous agents, they must be trained inside environments that reflect the real world.

Today, RL environment generation is bottlenecked by human labor. Companies hire contractors to hand-build tasks and artifacts one by one. This approach is expensive and doesn’t scale. Moreover, human data alone will never lead to superintelligence.

On the other hand, purely synthetically generated environments are not aligned with the real world and are untrustworthy.

Solution

We believe that the future of RL environments will be an advanced software simulation product, as opposed to just a human labor problem.

We’re developing world generation models and environment factories to increasingly automate and align the creation of RL environments, with humans in the loop. This allows for more complex and realistic worlds, and higher quality, scale, and diversity of tasks. This will be essential to unlock RL scaling.

Horizon-SWE

We recently launched Horizon-SWE, a benchmark that drops frontier models into a simulated software company.

It consists of a running application, real tools, and long-horizon tasks covering the entire software development lifecycle (planning, coding, testing, deployment, monitoring).

The benchmark measures the ability of AI agents to perform end-to-end SWE tasks, as opposed to code generation alone. Leading models score around 25% on the benchmark.

Read more about our methodology here: https://www.polymathlabs.ai/blog/horizon-swe

uploaded image

Team

Polymath is a team of researchers and engineers from UC Berkeley, Hume AI, Plaid, and Amazon. We have years of experience post-training frontier models in industry, and building large-scale data systems. Now we’re building the foundation that will enable the next generation of autonomous agents.


If you work at a frontier lab and are interested in acquiring environments, or know someone who is, we’d love to chat! (founders@polymathlabs.ai)

https://youtu.be/uqc3TCWJCto

Polymath
Founded:2026
Batch:Winter 2026
Team Size:2
Status:
Active
Location:San Francisco
Primary Partner:Ankit Gupta