Aquarium Learning

Aquarium Learning

We help ML teams improve their models by improving their datasets

ML models are only as good as the datasets they're trained on, and that means that most improvement to model performance comes from improvement to the quality and diversity of their datasets. Our tooling makes it easy for ML teams to find anomalies + failure patterns in their datasets and fix these problems by editing / adding the right data. So the next time you retrain your model, it just gets better.

Aquarium Learning
Team Size:12
Location:San Francisco
Group Partner:Nicolas Dessaigne

Active Founders

Peter Gao

Peter was an early employee (#18) at Cruise, where he built a large part of a self driving car from scratch. Before that, Peter did research on deep learning at UC Berkeley. Before that, he interned at Pinterest and Khan Academy, doing a mix of ML and web work. Now cofounder at Aquarium!

Quinn Johnson

Quinn is an engineer/manager who picked a *fantastic* time to co-found a company making deep learning pipelines that improve themselves. Before that he was at Ouster (leading data engineering / data viz), Cruise Automation (leading ML data engineering + labeling), and Graphistry (1st engineering hire, so a bit of everything). Working on self-driving cars has given him an irrational hatred for trees and shrubbery.

Quinn Johnson
Quinn Johnson
Aquarium Learning

Company Launches

TLDR: Excited to announce our new product, Tidepool! Tidepool does product analytics for AI text interfaces. With Tidepool, product teams can find patterns in their user text interactions to help make better product decisions.

Check our our demo video here!

The Problem:

LLM apps have introduced a new paradigm for interacting with software, where users can work iteratively with the software via a natural language interface, generating user inputs and model responses consisting of unstructured text.

Traditional product analytics techniques don't deal well with large amounts of unstructured text - it's hard to summarize, it's hard to aggregate, and it's hard to effectively sample. AI developers resort to digging through a pile of hundreds to hundreds of millions of datapoints of unstructured text to understand how users interact with their product.

The Solution:

Tidepool is a product analytics platform that solves these problems using neural network embeddings. After you upload user text interaction events, Tidepool will:

  1. Automatically group your data by similarity. Tidepool runs embedding clustering on your users’ text interactions to surface interesting attributes: things like prompt topics, prompt languages, and common usage patterns that can be turned into shortcuts.
  2. Summarize common attributes in your data, using LLMs to determine what each cluster “contains.” For example, understanding that the most common topics that users discuss are business, education, and art.
  3. Track attributes in production traffic, allowing you to uncover how a specific attribute might be correlated to good / bad product outcomes. We utilize lightweight models running on foundation model embeddings to scalably extract these attributes from hundreds of millions of production interaction events.

About Us:

Over the last few years, Aquarium has worked a lot with computer vision companies to help them curate labeled datasets and improve their fine-tuned models using our core embedding technology.

Our mission has always been to make it easier for people to build and improve production ML systems that solve real-world problems. When we saw the Cambrian explosion of LLM apps earlier this year, we realized that our core embedding technology and expertise was very useful for getting these new apps to product-market fit even faster!

Our Ask: