Open-source, serverless vectordb for production-scale generative AI

LanceDB is a new open-source vector database that can support low-latency billion-scale vector search on a single node. Built around a new columnar data format, LanceDB makes it incredibly easy to build applications for generative AI, recsys, search engines, content moderation, and more.

Team Size:10
Location:San Francisco
Group Partner:Jared Friedman

Active Founders

Chang She

My passion is building tools to make teams more productive working with data. I was VP of Eng at TubiTV where I designed the ML stack and the experiment platform, in addition to growing the team by more than 3x. Previously I was CTO/co-founder of DataPad and the second major contributor to the pandas library. In a former life, I was a financial quant with stints at AQR and Barclays.

Chang She
Chang She

Lei Xu

Lead in Machine Learning Platform in Cruise. Apache Hadoop HDFS PMC member and Senior Software Engineer at Cloudera. PhD in distributed storage system.

Company Launches

🙋 Are you building a generative AI app using an LLM or API? Are you working on a new recsys, a modern search engine, or a new analytical tool for unstructured data? Good news: you no longer have to struggle with Pinecone’s high cost, over the top complexity, or data privacy concerns.

🚀 LanceDB is a free and open-source vector database that you can run locally or on your own server. It’s lightning fast and is easy to embed into your backend server. Check out our github repo or pip install lancedb to get started.

For a quick demo, you can take a look at our sample youtube transcript search app. We’ll be adding a lot more soon. For more details on the API, take a look at LanceDB docs.

❓ Why we built LanceDB

As we spoke to builders of ML/AI applications, a common refrain from users was struggling to get services like Pinecone even running. After a while, we realized that the retrieve-filter-hydrate workflow was often time a big bottleneck in productivity and app latency.

So we put our heads together. I was one of the original co-authors of the pandas library. Lei was a core-contributor to HDFS and led ML infrastructure at Cruise. Using our experience building data/ML tooling, we’ve built something totally new.

LanceDB ❤️ builders

We’ve reimagined vector search from the ground up for better developer productivity, scalability, and performance. LanceDB is backed by Lance format — a modern columnar data format that is an alternative to parquet. It’s optimized for high speed random access and management of AI datasets like vectors, documents, and images.

We then added our own Rust implementations of a number of SOTA ANN-index algorithms to support low-latency vector search. These indices are SSD-based and can easily scale wayyyy beyond memory.

What’s more, LanceDB allows you to store and filter other features along with vectors. Our users have been able to replace 3–4 different data stores with LanceDB alone and achieve a speedup at the same time.

🛣️ Roadmap

Since our launch, the community has added LangChain support. Our integration for LlamaIndex is also under review. Our current focus is building a TypeScript implementation with a native-level experience.

Currently, we provide a python package called lancedb which is pip installable and delivers a great local workflow. Beyond what we’re working on right now, here’s what you can expect:

  • Ecosystem integrations into OpenAI plugin / AutoGPT etc
  • Richer set of embedding functions and document processing conveniences
  • Gallery of generative AI apps powered by LanceDB
  • Solutions for cloud deployment

🙏 How to get started

To get started with LanceDB head over to our repo. If you have questions, feedback, or want help using LanceDB in your app, don’t hesitate to drop us a line at contact@lancedb.com.