🔮 Haven - Run LLMs on Your Own Cloud

Haven makes it easy to deploy LLMs on your own infrastructure

Justus Mattern

10 months ago

#open_source#developer_tools

TL;DR: Haven makes it easy to deploy LLMs in your GCP / AWS environment. We provide the user experience of serverless solutions like Replicate or Huggingface Inference Endpoints - but instead of running your models on Haven’s servers, we just manage your infrastructure for you. This gives you full control over your data and you get to use your precious cloud credits!

🤕 The Problem

Many companies want to use large language models but do not want to rely on third-party providers as it makes them dependent on external services and comes with significant privacy risks.

However, deploying models in-house is just not easy: engineers have to figure out CUDA environments, write and containerize efficient serving code, and expose it as an API server. And let’s not even talk about scalability…

Haven solves this problem. We offer the simplicity, reliability, and scalability of third-party providers while allowing our users to securely run models on their own infrastructure.

🤯 How It Works

Haven is built on top of Kubernetes, which makes it easy to scale and reliable in production settings. Deploying a model with Haven consists of just three steps:

1. Upload a Service Account Key

Uploading a service account key gives Haven permission to manage your cloud resources for you. Under the hood, Haven will now set up a Kubernetes cluster running in your cloud environment.

2. Configure Your Deployment

You can now choose your model, select the GPUs you want it to run on, and configure the scaling behavior of your deployment. Haven will accordingly set up the model in your Kubernetes cluster.

3. Enjoy Your Model

Your LLM is up and running in minutes! Our deployments offer super-fast inference and advanced features like input batching for efficient serving. This way, you can immediately integrate them into your application.

💪 The Team

We’re Konstantin and Justus. The two of us met as freshmen in college, and have since built multiple projects together.

Justus is an AI researcher and has been working on LLMs since 2020. As a member of AI labs at UC Berkeley and ETH Zürich, he has published multiple papers at leading research conferences.

Konstantin has already been a full-time software engineer before starting his university studies. At Germany’s biggest social networking company, he built infrastructure handling requests from millions of daily users.

💜 Our Ask

Sign up here to work with us!

We are working closely with our users to fully understand the needs of companies aiming to integrate LLMs into their products. Beyond giving access to our deployment tool, we partner with early customers to evaluate their use cases and fine-tune models for their needs. If this sounds interesting, sign up for early access or contact us at hello@haven.run!

Refer us to companies looking to use LLMs!

If you have connections at companies looking to use LLMs, we’d highly appreciate introductions! You can reach us at hello@haven.run.