LanceDB: Open-source, serverless vectordb for production-scale generative AI

Open-source, serverless vectordb for production-scale generative AI

LanceDB is a new open-source vector database that can support low-latency billion-scale vector search on a single node. Built around a new columnar data format, LanceDB makes it incredibly easy to build applications for generative AI, recsys, search engines, content moderation, and more.

Active Founders

Chang She

Founder

My passion is building tools to make teams more productive working with data. I was VP of Eng at TubiTV where I designed the ML stack and the experiment platform, in addition to growing the team by more than 3x. Previously I was CTO/co-founder of DataPad and the second major contributor to the pandas library. In a former life, I was a financial quant with stints at AQR and Barclays.

Chang She

Founder

My passion is building tools to make teams more productive working with data. I was VP of Eng at TubiTV where I designed the ML stack and the experiment platform, in addition to growing the team by more than 3x. Previously I was CTO/co-founder of DataPad and the second major contributor to the pandas library. In a former life, I was a financial quant with stints at AQR and Barclays.

❓ Why we built LanceDB

As we spoke to builders of ML/AI applications, a common refrain from users was struggling to get services like Pinecone even running. After a while, we realized that the retrieve-filter-hydrate workflow was often time a big bottleneck in productivity and app latency.

So we put our heads together. I was one of the original co-authors of the pandas library. Lei was a core-contributor to HDFS and led ML infrastructure at Cruise. Using our experience building data/ML tooling, we’ve built something totally new.

uploaded image

LanceDB ❤️ builders

We’ve reimagined vector search from the ground up for better developer productivity, scalability, and performance. LanceDB is backed by Lance format — a modern columnar data format that is an alternative to parquet. It’s optimized for high speed random access and management of AI datasets like vectors, documents, and images.

We then added our own Rust implementations of a number of SOTA ANN-index algorithms to support low-latency vector search. These indices are SSD-based and can easily scale wayyyy beyond memory.

What’s more, LanceDB allows you to store and filter other features along with vectors. Our users have been able to replace 3–4 different data stores with LanceDB alone and achieve a speedup at the same time.

🛣️ Roadmap

Since our launch, the community has added LangChain support. Our integration for LlamaIndex is also under review. Our current focus is building a TypeScript implementation with a native-level experience.

Currently, we provide a python package called lancedb which is pip installable and delivers a great local workflow. Beyond what we’re working on right now, here’s what you can expect:

Ecosystem integrations into OpenAI plugin / AutoGPT etc
Richer set of embedding functions and document processing conveniences
Gallery of generative AI apps powered by LanceDB
Solutions for cloud deployment

🙏 How to get started

To get started with LanceDB head over to our repo. If you have questions, feedback, or want help using LanceDB in your app, don’t hesitate to drop us a line at contact@lancedb.com.