The AI Product Development Platform

Prompt Engineering: Super powers for prompt engineers - Compare prompts, models, and even LLM providers side-by-side - Curate a library of test cases to evaluate prompts against - Quantitatively evaluate the output of your prompts using industry-standard ML metrics (Bleu, Meteor, Levenshtein distance, Semantic similarity) Deployments: Confidently iterate on models in production - Simple API interface that proxies requests to any model provider - Back-testing & version control - Observability of all your inputs and outputs; UI & API to submit explicit or implicit user feedback Documents: Use your proprietary data in LLM applications - Robust API endpoint to submit documents (“corpus of text”) for querying against - Configurable chunking and semantic search strategies - Ability to query against corpus of text at run time Continuous Improvement: Continuously fine-tune to improve quality and lower cost - Passively accumulate training data to fine-tune your own proprietary models - Swap model providers or parameters under the hood – no code changes required We’re a team of MIT engineers and McKinsey consultants who’ve been building apps on GPT-3 for 3 years since it first came out. We’ve built similar tools in MLOps for 4 years and have closely experienced the pain we’re solving for our customers today. We believe that AI is the greatest technological leap since the internet. Our mission is to help companies adopt AI by taking their prototypes to production. If you have an AI use-case in mind, please reach out!

Jobs at Vellum

New York, NY, US / San Francisco, CA, US / Remote (US)
$140K - $200K
3+ years
Team Size:3
Group Partner:Brad Flora

Active Founders

Akash Sharma

Akash Sharma
Akash Sharma

Sidd Seethepalli

Vellum (W23) Founding Eng at Dover (S19)

Sidd Seethepalli
Sidd Seethepalli

Noa Flaherty

Noa Flaherty
Noa Flaherty

Company Launches

Hello hello! We’re Noa Flaherty, Akash Sharma, and Sidd Seethepalli from Vellum.

Tl;dr "Workflows" is a new product in Vellum's LLM dev platform that helps you quickly prototype, deploy, and manage complex chains of LLM calls and the business logic that tie them together. We solve the "whack-a-mole" problem encountered by companies that use popular open source frameworks to build AI applications, but are scared to make changes for fear of introducing regressions in production.

The Problem 😰: Many AI use-cases require chains of prompts, but experimentation and productionization of complex chains is hard.

We have helped dozens of customers take their AI prototypes to production by delivering tools for efficient prompt engineering, tightly integrated semantic search, prompt versioning, and performance monitoring. However, as the AI industry matures, we’ve found that more and more real-world use-cases require multi-step flows across actions like semantic search, multiple prompts/LLM calls, and bespoke business logic.

For example, if building a customer-support chatbot, you may want to:

  1. Use a fast, low-cost, model to categorize an incoming user question
  2. Depending on the categorization, query against a different index in a vector store to return relevant context about how to answer the question
  3. Feed that context into a prompt that’s been tuned to answer accurately about that topic
  4. Feed the output of that prompt into another that rephrases using your brand voice
  5. Finally, return the answer to your end user

Unfortunately, existing tools and frameworks don’t make it easy to:

  1. Rapidly experiment with these chains both step-by-step and end-to-end – especially if you’re non-technical
  2. Make changes with confidence once in production and avoid regressions
  3. Gain visibility into the performance of the system both as a whole, and at each step in the chain

The Solution 🤤: A fully managed platform for experimenting with, deploying, and managing AI workflows that power your app

Vellum Workflows provides a low-code UI for experimenting with and deploying LLM workflows to power features in your app.

You can construct a workflow using different “Nodes,” define “Input Variables” to the workflow, their values across different “Scenarios” and run with a single click to see the output at each step along the way.

Shown here is one of the workflows used in production by a customer of ours, Miri Health, for powering their health & wellness AI chatbot.

You get immediate feedback on whether your chain/prompts perform the way you expect without having to edit code, inspect console logs, or hop between browser tabs. You can validate that your workflow does what it should across a variety of scenarios / test cases.

Once you’re happy, you can deploy the Workflow directly in Vellum and invoke it through an API via Vellum’s python/node SDKs. Events for nodes that you subscribe to are streamed back using Server-Sent Events.

Invoke a workflow via a simple API. Use our officially supported python and node sdks, or roll your own.

By deploying your Workflow through Vellum, you can:

  • Mix and match models from different providers without having to integrate with each. Use the best prompt/mode for the job!
  • Have a production-ready backend in minutes without having to write, maintain, and host complex code and orchestration logic
  • Version your Workflow, see changes over time, and revert with one click
  • Get full observability into the production system, viewing inputs, outputs, timestamps, and more for the workflow as a whole, as well as each Node along the way.
  • Use role-based access control to determine which team members are allowed to experiment vs update production deployments

Monitor how your workflows are performing in production, with the ability to inspect the inputs/outputs of the workflow as a whole, as well as each step in the chain.

Looking Ahead

This is just the beginning! Our beta customers are already asking for things like:

  • A/B testing workflows for live experimentation
  • Test suites for evaluating that workflows are doing what they should and don’t break after an “improvement” is made
  • Composability via nested workflows
  • More node types for executing code, making calls to 3rd party APIs, etc.

Why Vellum?

Our focus to date has been to provide robust building blocks for creating production-ready AI applications. We’ve seen our customers assemble Vellum-powered Prompts and Semantic Search to create incredible products, version control and debug them using Vellum Deployments, and validate them when making changes using Vellum Test Suites.

Now that we have the building blocks, we’re well-positioned to help you assemble them. Workflows has been in closed-beta for a few weeks now and we already have customers using them to power their entire AI backend in production.

Vellum Workflows give us the opportunity to really tailor different parts of our product to the end users’ needs without having to invest in tons of custom development, which has dramatically decreased our time to market. As a technical, but non-engineering stakeholder, I’m able to truly participate in the development of the product experience and help deliver personalized AI-powered experiences to customers faster than I could have ever imagined.

Adam Daigian, Product Lead at Miri Health

We firmly believe that the best AI-powered products out there will be the result of close collaboration between technical and non-technical team members. We’ve repeatedly seen engineers set up the initial scaffolding, integrations, and guard-rails, while non-technical folks run experiments and tweak prompts/chains. No other platform facilitates this collaboration as well as Vellum.

Ask: How you can help

  • Sign up for a free 14-day trial if the problems we aim to solve resonate with you. Click here.
  • Share your thoughts and feedback on our direction in the comments below 👇
  • Spread the word with others you think this may help

Other Company Launches

Vellum - Build production-worthy LLM applications

The developer platform for building production-worthy Large Language Model applications
Read Launch ›

Company Photo

Company Photo

Hear from the founders

How did your company get started? (i.e., How did the founders meet? How did you come up with the idea? How did you decide to be a founder?)

We worked together at Dover (YC S19) for 2+ years where we built production use-cases of LLMs. Noa and Sidd are MIT engineers who have worked DataRobot’s MLOps team and Quora’s ML Platform team respectively. Akash spent 5 years at McKinsey’s Silicon Valley Office. While working with GPT-3 and Cohere to build user-facing LLM apps, we found ourselves building complex internal tooling to compare models, fine-tune them, measure performance, and improve quality over time. This took away time from building our user facing product. We’ve worked on ML Ops for traditional ML and wished we had the same when later working with LLMs, so we’re building it.