Data Engineering Startups funded by Y Combinator (YC) in the San Francisco Bay Area 2024

July 2024

Browse 25 of the top Data Engineering startups funded by Y Combinator. Headquartered in the San Francisco Bay Area, these are some of the hottest and fastest-growing startups.

We also have a Startup Directory where you can search through over 5,000 companies.

Fivetran (w2013) • Active • 1,200 employees • Oakland, CA, USA
Fivetran automates data movement out of, into and across cloud data platforms. We automate the most time-consuming parts of the ELT process from extracts to schema drift handling to transformations, so data engineers can focus on higher-impact projects with total pipeline peace of mind. With 99.9% uptime and self-healing pipelines, Fivetran enables hundreds of leading brands across the globe, including Autodesk, Conagra Brands, JetBlue, Lionsgate, Morgan Stanley, and Ziff Davis, to accelerate data-driven decisions and drive business growth. Fivetran is headquartered in Oakland, California, with offices around the world.
saas
b2b
analytics
data-engineering

TRM Labs (s2019) • Active • 180 employees • San Francisco, CA, USA
At TRM, we're on a mission to build trust in digital assets, because the promise of crypto is too valuable to be impeded by bad actors. We provide a blockchain intelligence platform to law enforcement, financial institutions, and crypto firms to assist in the detection and prevention of cryptocurrency fraud and financial crime. Our vision is to build a company that can sustainably deliver on our mission for decades to come, enabling consumers to transact safely and securely on the blockchain. Join our mission ➔ www.trmlabs.com/careers
fintech
machine-learning
crypto-web3
data-engineering

Airbyte (w2020) • Active • 110 employees • San Francisco, CA, USA
Airbyte is the leading open-source ELT platform that replicates data from applications, APIs & databases to data warehouses, data lakes, and other destinations. https://github.com/airbytehq/airbyte
developer-tools
open-source
data-engineering

Mozart Data (s2020) • Active • 24 employees • San Francisco, CA, USA
Mozart Data provides an out-of-the-box modern data stack that empowers anyone to easily consolidate, organize, and prepare their data for analysis. Spin up a data stack that’s built on a best-in-class data warehouse and ETL tool in hours, without any engineering. You can finally spend more time on generating insights and less time wrangling your data.
saas
b2b
data-engineering

Jitsu (s2020) • Active • 4 employees • San Francisco, CA, USA
Jitsu is the fastest, most durable way to collect event data from every source - web, app, email, chatbot, CRM - into your data warehouse. 100% open-source. Purpose built, secure and ready in minutes.
saas
b2b
open-source
data-engineering

Imbue (formerly Generally Intelligent) (s2017) • Active • 35 employees • San Francisco, CA, USA
Imbue builds AI systems that reason and code, enabling AI agents to accomplish larger goals and safely work in the real world. We train our own foundation models optimized for reasoning and prototype agents on top of these models. By using these agents extensively, we gain insights into improving both the capabilities of the underlying models and the interaction design for agents. We aim to rekindle the dream of the *personal* computer, where computers become truly intelligent tools that empower us, giving us freedom, dignity, and agency to pursue the things we love.
artificial-intelligence
machine-learning
data-engineering

HomeRoom (w2022) • Active • 25 employees • San Jose, CA, USA
Homeroom helps investors provide affordable housing while making a 22% ROI. We do this by sourcing properties, arranging capital, managing construction, vetting tenants and collecting rent by the room. To date, Homeroom has brought on 85 property investors, growing 6X annually, are bringing in 420K in annualized net-revenue How it works: We help investors buy homes in cities that are attractive to young people, but lack affordable housing options. We then renovate and after about 20 days, the home is ready and we find qualified renters by the room. We launched in 2018 in Kansas City with 1 home. We now have 105 homes in 31 cities. In 2021, we grew rental GMV to $1.8M (300% YoY growth). Our average rent across every property is $458, which is about 50% lower than market comps, and our investors see returns up to 50% higher. We are HomeRoom. Johnny is the financial analyst/domain expert. Thomas is a cereal entrepreneur with a PHD in ML, and Mike hacked growth for Airbnb and Facebook.
machine-learning
real-estate
proptech
nlp
data-engineering

Polytomic (w2020) • Active • 7 employees • San Francisco, CA, USA
Polytomic is a no-code web app to sync data between your internal databases, business systems (e.g. Stripe, Salesforce, etc), data warehouses, spreadsheets, and even HTTP APIs.
saas
b2b
data-engineering

Outerbase (w2023) • Active • 4 employees • Pittsburgh, PA, USA
Outerbase is the interface for your database. Companies use Outerbase to view, edit, and modify their data and even generate beautiful visual dashboards without having to write a single line of SQL.
developer-tools
generative-ai
analytics
data-engineering
ai

Tarsal (s2021) • Active • 10 employees • New York, NY, USA
Tarsal is a data pipeline custom built for security teams. As security data grows 25% year over year, security teams desperately need access to best-in-class data infrastructure. Tarsal bridges the gap between the modern data stack and security teams, pioneering the modern security data stack.
b2b
cybersecurity
big-data
data-engineering

Waydev (w2021) • Active • 15 employees • San Francisco, CA, USA
Leverage insights from your engineering stack to accelerate velocity, align engineering work to business priorities, and increase visibility into your team’s DORA Metrics and SPACE Framework Metrics
b2b
analytics
enterprise
data-engineering
ai-assistant

Pipekit (s2021) • Active • 7 employees • San Francisco, CA, USA
Our app manages Argo Workflows for data teams, enabling complex data & CI pipelines in half the time while saving companies hundreds of thousands of dollars annually. Argo Workflows is an open-source pipeline framework for Kubernetes that’s used in production by Bloomberg, Intuit, Adobe, New Relic, NVIDIA, and many other open-source early adopters.
developer-tools
open-source
data-engineering
devops

Dynamo AI (w2022) • Active • 40 employees • San Francisco, CA, USA
End-to-end privacy, security, and compliance solutions to prepare your organization for emerging AI regulations.
machine-learning
privacy
data-engineering

Hydra (w2022) • Active • 6 employees • San Francisco, CA, USA
Open source Snowflake alternative. Query billions of rows instantly on column-oriented Postgres. Hydra can be used as open source, managed cloud, or deployable in customer cloud infrastructure. Get parallelized analytics in minutes with no code changes
developer-tools
analytics
open-source
data-engineering

OneSchema (s2021) • Active • 10 employees • San Francisco, CA, USA
Product and engineering teams use OneSchema to save months of development time to build a CSV importer. OneSchema improves customer activation / import completion rates by automatically correcting customer data.
developer-tools
saas
b2b
data-engineering

Mezmo (w2015) • Active • 172 employees • San Jose, CA, USA
Mezmo, formerly LogDNA, is an observability platform to manage and take action on your data. It ingests, processes, and routes log data to fuel enterprise-level application development and delivery, security, and compliance use cases. Mezmo was brought to life by three-time co-founders Chris Nguyen and Lee Liu and included in the Winter 2015 batch of Y Combinator. In 2018 the company partnered with tech giant, IBM, to become the sole logging provider for IBM Cloud. Mezmo is on a mission to empower people who build solutions that shape the world. We’re doing this by delivering a platform that enables enterprises to get more value from their observability data in real time, regardless of source, destination, use case, or scale. We’re not the only ones working on this problem but we have a few things the others don’t. We’re cloud-native and know how to make the most of modern technology like Kubernetes. We have scaled a solution from zero to petabyte scale in a short amount of time, while supporting thousands of active users across multiple environments. We are hungry for change and are surrounded by enterprises telling us they’re hungry, too. We have a kick-ass group of people who are thinking about the problem analytically and are excited to change the observability world for the better. Mezmo has helped some of the world’s most innovative companies transform how they manage their systems and applications. Still, we know that we can help them get more value from their observability data by providing more flexibility and control over how they use it. This will enable teams to spend less time switching between data silos so they can focus on shipping better, more resilient, and secure products. We have momentum on our side. Last year we saw triple digit revenue growth and added 800 new customers to our roster. Recent accolades include being named to YC’s Top Companies, CRN’s 10 Hottest DevOps Startups, and EMA’s Top 3 Observability Platforms.
developer-tools
devsecops
saas
kubernetes
data-engineering

Platypus (w2021) • Active • 3 employees • San Francisco, CA, USA
For Business Operators: Connect & automate processes on top of any data, crazy fast. For Engineering Teams: Connect any data, across any stack, in any format, crazy fast.
b2b
workflow-automation
data-engineering
ai-assistant

Patterns (s2021) • Active • 2 employees • San Francisco, CA, USA
Patterns revolutionizes financial analysis by making it easy and accessible through natural language. We are seeking passionate individuals excited about simplifying financial analytics and transforming business intelligence. If you're interested in joining an innovative team in the finance space, explore our job openings and become part of our mission. Our advanced AI transforms financial data workflows and reporting, surpassing traditional spreadsheets and inflexible SaaS solutions. By integrating state-of-the-art LLMs with autonomous querying and financial reasoning, Patterns empowers practitioners to perform complex analyses effortlessly via a natural language interface.
analytics
data-science
data-engineering
data-visualization

FlowDeploy (w2022) • Active • 3 employees • Mountain View, CA, USA
FlowDeploy helps bioinformaticians manage their data analysis pipelines. We provide everything they need to try, run, develop, and share their pipelines. That includes integrations with AWS, Snakemake, Nextflow, GitHub, Slack, SSO, and more, as well as a clean API and web app for launching and monitoring pipelines and managing their data. FlowDeploy is built for bioinformaticians: it doesn't restrict how pipelines are built and managed, as long as a bioinformatics workflow manager like Nextflow or Snakemake is used. But it does eliminate several footguns like idle spend and accidental data egress, and it reduces the potential for users accidentally sharing credentials. FlowDeploy runs the pipelines in either our managed cloud or the customer's cloud – eliminating the need to transfer data externally. Non-computational biologists can use FlowDeploy, too: features like pipelines templates decrease the complexity to launch a new pipeline, which reduces user error and decreases the need for advanced cloud training for non-computational users.
developer-tools
drug-discovery
data-engineering

Etleap (w2013) • Active • 11 employees • San Francisco, CA, USA
Etleap is an ETL solution for creating perfect data pipelines from day one. Unlike other enterprise solutions, Etleap doesn’t require extensive engineering work to set up, maintain, and scale. It automates most ETL setup and maintenance work, and simplifies the rest into 10-minute tasks that analysts can own.
data-engineering

Chaos Genius (w2020) • Active • 10 employees • San Francisco, CA, USA
Chaos Genius is a DataOps Observability platform for Snowflake. Enable Snowflake Observability to reduce Snowflake costs and optimize query performance.
cloud-workload-protection
machine-learning
analytics
open-source
data-engineering

Logarithm Labs (w2020) • Active • 2 employees • Foster City, CA, USA
Easy button to use data for your daily operations. Power your business workflows with quality data. Logarithm Labs helps you turn manual data wrangling and ad-hoc scripts into repeatable pipelines for your operational workflows. Power your workflows with quality data. Our product and team of experts do the heavy lifting so that can focus on the business logic that drives your organization. To learn more, contact us at hello@logarithmlabs.com.
developer-tools
data-engineering

LanceDB (w2022) • Active • 10 employees • San Francisco, CA, USA
LanceDB is a new open-source vector database that can support low-latency billion-scale vector search on a single node. Built around a new columnar data format, LanceDB makes it incredibly easy to build applications for generative AI, recsys, search engines, content moderation, and more.
aiops
machine-learning
data-engineering

BackType (s2008) • Acquired0 • San Francisco, CA, USA
saas
data-engineering

Satsuma (s2021) • Acquired • 5 employees • San Francisco, CA, USA
Satsuma is a developer tool for building applications on top of real-time blockchain data. Our product lets developers take decoded data from multiple chains, customize it for their use cases, and access it through API endpoints. Blockchains serve as distributed databases for these products, holding their most important data. However, it’s difficult to access and query that data. We believe this friction is an enormous blocker for web3 developers and that better tooling will enable mass adoption for web3. We’re a founding team of engineers, having built data infrastructure and product as early employees at Airtable, Heap, and Y Combinator.
developer-tools
saas
crypto-web3
data-engineering

Data Engineering Startups funded by Y Combinator (YC) in the San Francisco Bay Area 2024

Hottest Startup Categories

Startups by Industry

Startups by Location

Startups Hiring by Location