Data Engineering Startups funded by Y Combinator (YC) in the San Francisco Bay Area 2024

December 2024

Browse 22 of the top Data Engineering startups funded by Y Combinator. Headquartered in the San Francisco Bay Area, these are some of the hottest and fastest-growing startups.

We also have a Startup Directory where you can search through over 5,000 companies.

  • Fivetran
    Fivetran (w2013)Active • 1,200 employees • Oakland, CA, USA
    Fivetran automates data movement out of, into and across cloud data platforms. We automate the most time-consuming parts of the ELT process from extracts to schema drift handling to transformations, so data engineers can focus on higher-impact projects with total pipeline peace of mind. With 99.9% uptime and self-healing pipelines, Fivetran enables hundreds of leading brands across the globe, including Autodesk, Conagra Brands, JetBlue, Lionsgate, Morgan Stanley, and Ziff Davis, to accelerate data-driven decisions and drive business growth. Fivetran is headquartered in Oakland, California, with offices around the world. 
    saas
    b2b
    analytics
    data-engineering
  • TRM Labs
    TRM Labs (s2019)Active • 180 employees • San Francisco, CA, USA
    At TRM, we're on a mission to build trust in digital assets, because the promise of crypto is too valuable to be impeded by bad actors. We provide a blockchain intelligence platform to law enforcement, financial institutions, and crypto firms to assist in the detection and prevention of cryptocurrency fraud and financial crime. Our vision is to build a company that can sustainably deliver on our mission for decades to come, enabling consumers to transact safely and securely on the blockchain. Join our mission ➔ www.trmlabs.com/careers
    fintech
    machine-learning
    crypto-web3
    data-engineering
  • Mozart Data
    Mozart Data (s2020)Active • 24 employees • San Francisco, CA, USA
    Mozart Data provides an out-of-the-box modern data stack that empowers anyone to easily consolidate, organize, and prepare their data for analysis. Spin up a data stack that’s built on a best-in-class data warehouse and ETL tool in hours, without any engineering. You can finally spend more time on generating insights and less time wrangling your data.
    saas
    b2b
    data-engineering
  • Waydev
    Waydev (w2021)Active • 15 employees • San Francisco, CA, USA
    AI Agents for Engineering Leadership. Designed to boost productivity, predict outcomes, and support team well-being, these agents will provide insights into team mood and challenges and offer practical improvement suggestions.
    b2b
    analytics
    enterprise
    data-engineering
    ai-assistant
  • Hydra
    Hydra (w2022)Active • 6 employees • San Francisco, CA, USA
    Hydra is the easiest way to build real-time apps and analytics with Postgres. Every project with Hydra uses pg_duckdb, an open source (MIT licensed) program that embeds DuckDB's state-of-the-art analytics engine and features within Postgres. We developed pg_duckdb, the official Postgres extension for DuckDB, in collaboration with the DuckDB Foundation. https://github.com/duckdb/pg_duckdb
    developer-tools
    analytics
    open-source
    data-engineering
  • Airbyte
    Airbyte (w2020)Active • 90 employees • San Francisco, CA, USA
    Airbyte is the leading open data movement platform that empowers data teams in the AI era by transforming raw data into actionable intelligence. With the largest catalog of over 350 connectors, it offers low-code, no-code, and AI-powered connector development, and provides flexible deployment options across self-hosted, cloud, and hybrid environments. https://github.com/airbytehq/airbyte
    developer-tools
    open-source
    data-engineering
  • Jitsu
    Jitsu (s2020)Active • 4 employees • San Francisco, CA, USA
    Jitsu is the fastest, most durable way to collect event data from every source - web, app, email, chatbot, CRM - into your data warehouse. 100% open-source. Purpose built, secure and ready in minutes.
    saas
    b2b
    open-source
    data-engineering
  • Imbue
    Imbue (s2017)Active • 35 employees • San Francisco, CA, USA
    Imbue builds AI systems that reason and code, enabling AI agents to accomplish larger goals and safely work in the real world. We train our own foundation models optimized for reasoning and prototype agents on top of these models. By using these agents extensively, we gain insights into improving both the capabilities of the underlying models and the interaction design for agents. We aim to rekindle the dream of the *personal* computer, where computers become truly intelligent tools that empower us, giving us freedom, dignity, and agency to pursue the things we love.
    machine-learning
    data-engineering
    ai
  • HomeRoom
    HomeRoom (w2022)Active • 25 employees • San Jose, CA, USA
    Homeroom helps investors provide affordable housing while making a 22% ROI. We do this by sourcing properties, arranging capital, managing construction, vetting tenants and collecting rent by the room. To date, Homeroom has brought on 85 property investors, growing 6X annually, are bringing in 420K in annualized net-revenue How it works: We help investors buy homes in cities that are attractive to young people, but lack affordable housing options. We then renovate and after about 20 days, the home is ready and we find qualified renters by the room. We launched in 2018 in Kansas City with 1 home. We now have 105 homes in 31 cities. In 2021, we grew rental GMV to $1.8M (300% YoY growth). Our average rent across every property is $458, which is about 50% lower than market comps, and our investors see returns up to 50% higher. We are HomeRoom. Johnny is the financial analyst/domain expert. Thomas is a cereal entrepreneur with a PHD in ML, and Mike hacked growth for Airbnb and Facebook.
    machine-learning
    real-estate
    proptech
    nlp
    data-engineering
  • Polytomic
    Polytomic (w2020)Active • 7 employees • San Francisco, CA, USA
    Polytomic is a no-code web app to sync data between your internal databases, business systems (e.g. Stripe, Salesforce, etc), data warehouses, spreadsheets, and even HTTP APIs.
    saas
    b2b
    data-engineering
  • Dynamo AI
    Dynamo AI (w2022)Active • 40 employees • San Francisco, CA, USA
    End-to-end privacy, security, and compliance solutions to prepare your organization for emerging AI regulations.
    machine-learning
    privacy
    data-engineering
  • OneSchema
    OneSchema (s2021)Active • 16 employees • San Francisco, CA, USA
    OneSchema is the embeddable CSV importer used by product and engineering teams to save months of development time and to automate CSV mapping, validation, and transformation.
    developer-tools
    saas
    b2b
    data-engineering
  • Patterns
    Patterns (s2021)Active • 2 employees • San Francisco, CA, USA
    Patterns revolutionizes financial analysis by making it easy and accessible through natural language. We are seeking passionate individuals excited about simplifying financial analytics and transforming business intelligence. If you're interested in joining an innovative team in the finance space, explore our job openings and become part of our mission. Our advanced AI transforms financial data workflows and reporting, surpassing traditional spreadsheets and inflexible SaaS solutions. By integrating state-of-the-art LLMs with autonomous querying and financial reasoning, Patterns empowers practitioners to perform complex analyses effortlessly via a natural language interface.
    analytics
    data-science
    data-engineering
    data-visualization
  • Etleap
    Etleap (w2013)Active • 11 employees • San Francisco, CA, USA
    Etleap is an ETL solution for creating perfect data pipelines from day one. Unlike other enterprise solutions, Etleap doesn’t require extensive engineering work to set up, maintain, and scale. It automates most ETL setup and maintenance work, and simplifies the rest into 10-minute tasks that analysts can own.
    data-engineering
  • Logarithm Labs
    Logarithm Labs (w2020)Active • 2 employees • Foster City, CA, USA
    Easy button to use data for your daily operations. Power your business workflows with quality data. Logarithm Labs helps you turn manual data wrangling and ad-hoc scripts into repeatable pipelines for your operational workflows. Power your workflows with quality data. Our product and team of experts do the heavy lifting so that can focus on the business logic that drives your organization. To learn more, contact us at hello@logarithmlabs.com.
    developer-tools
    data-engineering
  • Mezmo
    Mezmo (w2015)Active • 172 employees • San Jose, CA, USA
    Mezmo, formerly LogDNA, is an observability platform to manage and take action on your data. It ingests, processes, and routes log data to fuel enterprise-level application development and delivery, security, and compliance use cases. Mezmo was brought to life by three-time co-founders Chris Nguyen and Lee Liu and included in the Winter 2015 batch of Y Combinator. In 2018 the company partnered with tech giant, IBM, to become the sole logging provider for IBM Cloud. Mezmo is on a mission to empower people who build solutions that shape the world. We’re doing this by delivering a platform that enables enterprises to get more value from their observability data in real time, regardless of source, destination, use case, or scale. We’re not the only ones working on this problem but we have a few things the others don’t. We’re cloud-native and know how to make the most of modern technology like Kubernetes. We have scaled a solution from zero to petabyte scale in a short amount of time, while supporting thousands of active users across multiple environments. We are hungry for change and are surrounded by enterprises telling us they’re hungry, too. We have a kick-ass group of people who are thinking about the problem analytically and are excited to change the observability world for the better. Mezmo has helped some of the world’s most innovative companies transform how they manage their systems and applications. Still, we know that we can help them get more value from their observability data by providing more flexibility and control over how they use it. This will enable teams to spend less time switching between data silos so they can focus on shipping better, more resilient, and secure products. We have momentum on our side. Last year we saw triple digit revenue growth and added 800 new customers to our roster. Recent accolades include being named to YC’s Top Companies, CRN’s 10 Hottest DevOps Startups, and EMA’s Top 3 Observability Platforms.
    developer-tools
    devsecops
    saas
    kubernetes
    data-engineering
  • FlowDeploy
    FlowDeploy (w2022)Active • 3 employees • Mountain View, CA, USA
    FlowDeploy helps bioinformaticians manage their data analysis pipelines. We provide everything they need to try, run, develop, and share their pipelines. That includes integrations with AWS, Snakemake, Nextflow, GitHub, Slack, SSO, and more, as well as a clean API and web app for launching and monitoring pipelines and managing their data. FlowDeploy is built for bioinformaticians: it doesn't restrict how pipelines are built and managed, as long as a bioinformatics workflow manager like Nextflow or Snakemake is used. But it does eliminate several footguns like idle spend and accidental data egress, and it reduces the potential for users accidentally sharing credentials. FlowDeploy runs the pipelines in either our managed cloud or the customer's cloud – eliminating the need to transfer data externally. Non-computational biologists can use FlowDeploy, too: features like pipelines templates decrease the complexity to launch a new pipeline, which reduces user error and decreases the need for advanced cloud training for non-computational users.
    developer-tools
    drug-discovery
    data-engineering
  • Chaos Genius
    Chaos Genius (w2020)Active • 10 employees • San Francisco, CA, USA
    Chaos Genius is a DataOps Observability platform for Snowflake. Enable Snowflake Observability to reduce Snowflake costs and optimize query performance.
    cloud-workload-protection
    machine-learning
    analytics
    open-source
    data-engineering
  • LanceDB
    LanceDB (w2022)Active • 10 employees • San Francisco, CA, USA
    LanceDB is a new open-source vector database that can support low-latency billion-scale vector search on a single node. Built around a new columnar data format, LanceDB makes it incredibly easy to build applications for generative AI, recsys, search engines, content moderation, and more.
    aiops
    machine-learning
    data-engineering
  • Grai
    Grai (s2022)Active • 3 employees • San Francisco, CA, USA
    Grai is open source version control for metadata. We can determine how database changes will affect deployed machine learning models, apis, and dashboards because we understand how data relates across systems which don’t otherwise talk to each other.
    developer-tools
    saas
    analytics
    open-source
    data-engineering
  • Satsuma
    Satsuma (s2021)Acquired • 5 employees • San Francisco, CA, USA
    Satsuma is a developer tool for building applications on top of real-time blockchain data. Our product lets developers take decoded data from multiple chains, customize it for their use cases, and access it through API endpoints. Blockchains serve as distributed databases for these products, holding their most important data. However, it’s difficult to access and query that data. We believe this friction is an enormous blocker for web3 developers and that better tooling will enable mass adoption for web3. We’re a founding team of engineers, having built data infrastructure and product as early employees at Airtable, Heap, and Y Combinator.
    developer-tools
    saas
    crypto-web3
    data-engineering