nCompass Technologies

nCompass Technologies

Deploy hardware accelerated AI models with only one line of code

nCompass is a platform for acceleration and hosting of open-source and custom AI models. We provide low-latency AI deployment without rate-limiting you. All with just one line of code.

nCompass Technologies
Team Size:2
Group Partner:Dalton Caldwell

Active Founders

Aditya Rajagopal

I am a recent PhD graduate from Imperial College London with experience in machine learning algorithms, compilers and hardware architectures. I've worked in compiler teams at Qualcomm and Huawei as well as served as a reviewer for ICML. My co-founder and I are building nCompass which is a platform for accelerating and hosting both open-source and custom large AI models. Our focus is on providing rate unlimited and low latency large AI inference with only one line of code.

Diederik Vink

I'm a recent Imperial College London PhD Graduate where I specialized in reconfigurable hardware architectures for accelerated machine learning and reduced precision training algorithms. I have worked as an AI feasibility consultant prototyping and evaluating AI spin-outs. We are building nCompass, a platform for accelerating and hosting both open-source and custom large AI models. Our focus is on providing rate-unlimited and low latency large AI inference with only one line of code.

Company Launches

tl;dr If unpredictable response times and rate limits of OpenAI are causing your tool’s user experience to suffer, nCompass allows you to effortlessly tap into the world of open-source AI models while ensuring that the served models meet your target budget and performance requirements.

Hey all, we are Diederik and Aditya, the co-founders of nCompass, a platform for simplified hosting and acceleration of open-source and custom LLMs.

The Problem

LLM-based products that use closed-source model providers like OpenAI suffer from slow response times and rate limits.

Open-source models are a great alternative, but hosting a model yourself is a lot of extra work and maintenance which distracts you from your core business.

Our solution

nCompass provides an API that allows you to integrate accelerated versions of any open-source or custom model of your choice into your AI pipeline. We support OpenAI style chat templates, work with all web frameworks, and have a time-based pricing model that results in a predictable compute cost for users.

How it works

We serve models to users with a simple 3-step process:

  1. Select your desired open-source / custom model
  2. Provide your performance requirements
  3. Set a budget you are not willing to exceed

We set up the deployment that meets these requirements and provide you with a single API Key that you can then use to integrate the model with a single line of code.

We support any model currently hosted on Hugging Face, with some highlights being:

  • Mistral-7B : 160ms Time-To-First-Token @ 86 tok/s
  • Mixtral-8x7B : 300ms Time-To-First-Token @ 64 tok/s



Also, check out our GitHub repository for code examples.

The team

Since we met in undergrad (9 years ago) through to our PhDs at Imperial College London, we’ve worked on every project together. Our PhDs focused on hardware acceleration of large-scale machine learning models covering all levels of the stack from algorithms and compilers down to digital hardware design.


  • Book a demo
  • Warm intros to anyone you know who requires accelerated and/or hosted versions of open-source models.

Our emails are aditya.rajagopal@ncompass.tech and diederik.vink@ncompass.tech