CambioML - the "Private ML Scientists" for Large Enterprises

Transform messy multi-modality data to MOE (mixed of experts) LLM/LVM

TL;DR: CambioML enables ML practitioners to build AI agents on enterprises’ confidential data with two open-source libraries:

  • Uniflow provides hands-on code to transform unstructured data from different modalities to structured trainable data;
  • Pykoi provides code for fine-tuning, evaluation, and UI functionality for feedback data collection.

Our end-to-end SaaS solutions enable Fortune 500 companies to build self-owned AI agents based on their massive, multi-modal and confidential data. We’re building what we wish we had when developing foundation models at Amazon.

Hi everyone, we’re Jared and Rachel and our mission is to make building foundation models as easy as possible. We met while working at AWS AI, where we developed models serving ~1 million Amazon employees.

The problem

During our 8 years of building models at Amazon, we have been constantly frustrated by the lack of quality tools for developing and deploying large machine learning models.

Taking a foundation model from development to production in big tech requires multiple specialized job roles: Scientists develop the model. Data annotators label data for model improvements. Research engineers manage the labeled data for fine-tuning. Software engineers deploy the model to scalable resources. Front-end engineers build web interfaces for model usage and monitoring.

This resource-intensive separation of concerns exists because current tools don't solve common pain points in the end-to-end process:

  • Cleaning and transforming messy, unstructured data takes over 80% of ML practitioners’ time.
  • RLHF finetuning tools are still early, requiring management of multiple models, datasets, and resource stacks in conjunction.
  • Evaluating foundation models is difficult beyond academic metrics.
  • Annotators often provide data in mismatched formats for model training and, not being model end-users, produce less than ideal annotation quality.
  • Deployment poses many questions: environment selection, resource management, availability, etc.

The solution
We're building the tools we wish we had. Our open source libraries, Uniflow and Pykoi, make it easy for ML practitioners to iterate on, evaluate, and eventually deploy foundation models to production:

  • Easy data transforming: Say goodbye to tedious work, like cleaning messy, unstructured and multi-modal data.
  • Closing the loop on active learning: Once you've annotated enough data, you can run our RLHF functionality directly on your model, with just a few lines of code. Other fine-tuning & augmentation options are also provided.
  • Clear evaluation and benchmarking: Compare and understand your models before and after fine-tuning with our compare interface and visuals.
  • Simple data annotation & collection: Create your own custom annotation tasks or share a UI directly on your model. Gone are the days of moving annotated data from excel sheets to models.
  • Deployment: We're including deployment options to run managed execution, maximum cost saving batch jobs on any cloud.

A Quick Demo

Here's a quick video to showcase what we've developed in the last few weeks!

We execute rapidly, and new features are being released weekly. Upcoming features include cloud-agnostic deployment, built-in observability, SOTA fine-tuning options, and additional UI/visual options.

Our Ask

We're open-source and building out our feature set in public. Check out our project on Github (we'd appreciate a star!) and feel free to contribute!