Vellum Playground: Super powers for prompt engineers - Compare prompts, models, and even LLM providers side-by-side - Curate a library of test cases to evaluate prompts against - Quantitatively evaluate the output of your prompts using industry-standard ML metrics (Bleu, Meteor, Levenshtein distance, Semantic similarity) Vellum Manage: Confidently iterate on models in production - Simple API interface that proxies requests to any model provider - Back-testing & version control - Observability of all your inputs and outputs; UI & API to submit explicit or implicit user feedback Vellum Search: Use your proprietary data in LLM applications - Robust API endpoint to submit documents (“corpus of text”) for querying against - Configurable chunking and semantic search strategies - Ability to query against corpus of text at run time Vellum Optimize: Continuously fine-tune to improve quality and lower cost - Passively accumulate training data to fine-tune your own proprietary models - Swap model providers or parameters under the hood – no code changes required We’re a team of MIT engineers and McKinsey consultants who’ve been building apps on GPT-3 for 3 years since it first came out. We’ve built similar tools in MLOps for 4 years and have closely experienced the pain we’re solving for our customers today. We believe that AI is the greatest technological leap since the internet. Our mission is to help companies adopt AI by taking their prototypes to production. If you have an AI use-case in mind, please reach out!
We worked together at Dover (YC S19) for 2+ years where we built production use-cases of LLMs. Noa and Sidd are MIT engineers who have worked DataRobot’s MLOps team and Quora’s ML Platform team respectively. Akash spent 5 years at McKinsey’s Silicon Valley Office. While working with GPT-3 and Cohere to build user-facing LLM apps, we found ourselves building complex internal tooling to compare models, fine-tune them, measure performance, and improve quality over time. This took away time from building our user facing product. We’ve worked on ML Ops for traditional ML and wished we had the same when later working with LLMs, so we’re building it.