nCompass is a platform for acceleration and hosting of open-source and custom AI models. We provide low-latency AI deployment without rate-limiting you. All with just one line of code.
I am a recent PhD graduate from Imperial College London with experience in machine learning algorithms, compilers and hardware architectures. I've worked in compiler teams at Qualcomm and Huawei as well as served as a reviewer for ICML. My co-founder and I are building nCompass which is a platform for accelerating and hosting both open-source and custom large AI models. Our focus is on providing rate unlimited and low latency large AI inference with only one line of code.
I'm a recent Imperial College London PhD Graduate where I specialized in reconfigurable hardware architectures for accelerated machine learning and reduced precision training algorithms. I have worked as an AI feasibility consultant prototyping and evaluating AI spin-outs. We are building nCompass, a platform for accelerating and hosting both open-source and custom large AI models. Our focus is on providing rate-unlimited and low latency large AI inference with only one line of code.
tl;dr If unpredictable response times and rate limits of OpenAI are causing your tool’s user experience to suffer, nCompass allows you to effortlessly tap into the world of open-source AI models while ensuring that the served models meet your target budget and performance requirements.
—
Hey all, we are Diederik and Aditya, the co-founders of nCompass, a platform for simplified hosting and acceleration of open-source and custom LLMs.
LLM-based products that use closed-source model providers like OpenAI suffer from slow response times and rate limits.
Open-source models are a great alternative, but hosting a model yourself is a lot of extra work and maintenance which distracts you from your core business.
nCompass provides an API that allows you to integrate accelerated versions of any open-source or custom model of your choice into your AI pipeline. We support OpenAI style chat templates, work with all web frameworks, and have a time-based pricing model that results in a predictable compute cost for users.
We serve models to users with a simple 3-step process:
We set up the deployment that meets these requirements and provide you with a single API Key that you can then use to integrate the model with a single line of code.
We support any model currently hosted on Hugging Face, with some highlights being:
https://www.youtube.com/watch?v=sdHVji8QGOg
Also, check out our GitHub repository for code examples.
Since we met in undergrad (9 years ago) through to our PhDs at Imperial College London, we’ve worked on every project together. Our PhDs focused on hardware acceleration of large-scale machine learning models covering all levels of the stack from algorithms and compilers down to digital hardware design.
Our emails are aditya.rajagopal@ncompass.tech and diederik.vink@ncompass.tech