Data Engineering Startups funded by Y Combinator (YC) 2024

February 2024

Browse 80 of the top Data Engineering startups funded by Y Combinator.

We also have a Startup Directory where you can search through over 4,000 companies.

  • Fivetran
    Fivetran (w2013)Active • 1,200 employees • Oakland, CA, USA
    Fivetran automates data movement out of, into and across cloud data platforms. We automate the most time-consuming parts of the ELT process from extracts to schema drift handling to transformations, so data engineers can focus on higher-impact projects with total pipeline peace of mind. With 99.9% uptime and self-healing pipelines, Fivetran enables hundreds of leading brands across the globe, including Autodesk, Conagra Brands, JetBlue, Lionsgate, Morgan Stanley, and Ziff Davis, to accelerate data-driven decisions and drive business growth. Fivetran is headquartered in Oakland, California, with offices around the world. 
    saas
    b2b
    analytics
    data-engineering
  • Airbyte
    Airbyte (w2020)Active • 110 employees • San Francisco, CA, USA
    Airbyte is the leading open-source ELT platform that replicates data from applications, APIs & databases to data warehouses, data lakes, and other destinations. https://github.com/airbytehq/airbyte
    developer-tools
    open-source
    data-engineering
  • Supabase
    Supabase (s2020)Active • 70 employees
    Supabase is the easiest way to get started with Postgres. Each project within Supabase is an isolated Postgres cluster, allowing customers to scale independently, while still providing the features that you need to build: instant database setup, auth, row level security, realtime data streams, auto-generating APIs, and a simple to use web interface. We are 100% remote.
    developer-tools
    open-source
    big-data
    data-engineering
    databases
  • TRM Labs
    TRM Labs (s2019)Active • 180 employees • San Francisco, CA, USA
    At TRM, we're on a mission to build trust in digital assets, because the promise of crypto is too valuable to be impeded by bad actors. We provide a blockchain intelligence platform to law enforcement, financial institutions, and crypto firms to assist in the detection and prevention of cryptocurrency fraud and financial crime. Our vision is to build a company that can sustainably deliver on our mission for decades to come, enabling consumers to transact safely and securely on the blockchain. Join our mission ➔ www.trmlabs.com/careers
    fintech
    machine-learning
    crypto-web3
    data-engineering
  • Gecko Robotics
    Gecko Robotics (w2016)Active • 230 employees • Austin, TX, USA
    The mission of Gecko Robotics is to improve the state of the world by helping the most important institutions ensure the availability, reliability and sustainability of critical infrastructure. Gecko's combination of wall-climbing robots, industry-leading sensors, and an AI-powered data platform give customers a unique window into the health of their physical assets allowing real-time decisions that prevent power outages, ensure military missions succeed, and help reduce energy costs.
    robotics
    energy
    big-data
    data-engineering
    ai
  • Mezmo
    Mezmo (w2015)Active • 172 employees • San Jose, CA, USA
    Mezmo, formerly LogDNA, is an observability platform to manage and take action on your data. It ingests, processes, and routes log data to fuel enterprise-level application development and delivery, security, and compliance use cases. Mezmo was brought to life by three-time co-founders Chris Nguyen and Lee Liu and included in the Winter 2015 batch of Y Combinator. In 2018 the company partnered with tech giant, IBM, to become the sole logging provider for IBM Cloud. Mezmo is on a mission to empower people who build solutions that shape the world. We’re doing this by delivering a platform that enables enterprises to get more value from their observability data in real time, regardless of source, destination, use case, or scale. We’re not the only ones working on this problem but we have a few things the others don’t. We’re cloud-native and know how to make the most of modern technology like Kubernetes. We have scaled a solution from zero to petabyte scale in a short amount of time, while supporting thousands of active users across multiple environments. We are hungry for change and are surrounded by enterprises telling us they’re hungry, too. We have a kick-ass group of people who are thinking about the problem analytically and are excited to change the observability world for the better. Mezmo has helped some of the world’s most innovative companies transform how they manage their systems and applications. Still, we know that we can help them get more value from their observability data by providing more flexibility and control over how they use it. This will enable teams to spend less time switching between data silos so they can focus on shipping better, more resilient, and secure products. We have momentum on our side. Last year we saw triple digit revenue growth and added 800 new customers to our roster. Recent accolades include being named to YC’s Top Companies, CRN’s 10 Hottest DevOps Startups, and EMA’s Top 3 Observability Platforms.
    developer-tools
    devsecops
    saas
    kubernetes
    data-engineering
  • Spruce Systems
    Spruce Systems (w2021)Active • 25 employees • New York, NY, USA
    Spruce lets users control their data across the web. We believe that the world is evolving toward one based on cryptography, networks, and digital economies that are user-controlled. Today, the dominant use case for user keys is the signing of blockchain transactions, but we think this barely scratches the surface of what is possible. Soon, the entirety of a user’s digital interactions will be based on their keypairs, and we’re unlocking this transition with our constellation of products. We are passionate about cultivating a thriving culture of diverse individuals who bring unique perspectives to our mission. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.
    crypto-web3
    identity
    open-source
    privacy
    data-engineering
  • Narrator
    Narrator (s2019)Active • 8 employees • New York, NY, USA
    Narrator is an end-to-end platform built on top of the data standard, the Activity Schema and starting at $500/mo. Data analyst are able to build their definitions of their user journey, and use that journey to answer any question that comes up. From there, data can be visualized in a dashboard, used to build a story like analysis, exported and more. The biggest values of Narrator is Speed and Cost reduction. Small teams are able to move fast and answer questions in minutes allowing them to preform the work of very large data teams. All while Narrator is optimized to minimize compute cost of the warehouse.
    analytics
    big-data
    data-engineering
  • Stacksync
    Stacksync (w2024)Active • 4 employees • San Francisco, CA, USA
    Stacksync powers real-time and bidirectional data synchronization between CRMs (e.g. Salesforce, Hubspot or SAP) and databases (e.g. Postgres, Google BigQuery,...). Edits made in your CRM will instantly update in your Database, and vice-versa. To set up a sync, users simply have to connect the two chosen apps in one click and select the tables they want to sync, no-code! Stacksync reduces implementation delays from months to minutes for CRM integration projects and removes all the complexity behind CRM new feature development. We show a 90% improvement on delivery time and budget.
    b2b
    api
    crm
    data-engineering
    databases
  • Metaplane
    Metaplane (w2020)Active • 12 employees • Boston, MA, USA
    Metaplane ensures everyone trusts the data that powers your business. Data teams at Bose, Ramp, and Klaviyo use our data observability platform to prevent and detect data issues — before the CEO pings them about weird revenue numbers. We do this with ML-based anomaly detection, end-to-end column-level lineage, and tools to help prevent incidents before they occur. You can monitor your entire data stack within 30 minutes. The company is backed by Khosla Ventures, Y Combinator, and the founders of Okta, HubSpot, and Vercel.
    developer-tools
    saas
    data-engineering
  • DataShare
    DataShare (s2023)Active • 1 employees • Austin, TX, USA
    DataShare is a data-as-a-service platform that lets you embed charts, dashboards and exports directly into your product. For example, if you run an accounting startup, DataShare would enable you to embed a full profit and loss dashboard, with downloadable statements. DataShare is backed by an enterprise-grade data warehouse, and can be implemented in fewer than 20 lines of code.
    analytics
    data-engineering
    databases
  • Serra
    Serra (s2023)Active • 2 employees • San Francisco, CA, USA
    Serra is Tableau for data infrastructure. Serra enables smaller, less-technical teams to build cloud data infrastructure—batch and real-time data pipelines, rapid SQL analytics, and scalable data science and ML—through a user-friendly dashboard.
    developer-tools
    big-data
    data-engineering
    ai
  • Briefer
    Briefer (s2023)Active • 2 employees • São Paulo, State of São Paulo, Brazil
    Briefer helps data scientists and analysts build interactive visualizations and data apps using a Notion-like interface. Connect to your data sources, write SQL or Python, collaborate through comments and multiplayer editing, and run code in whichever compute environments you need.
    developer-tools
    b2b
    data-science
    data-engineering
    data-visualization
  • OneSchema
    OneSchema (s2021)Active • 10 employees • San Francisco, CA, USA
    Product and engineering teams use OneSchema to save months of development time to build a CSV importer. OneSchema improves customer activation / import completion rates by automatically correcting customer data.
    developer-tools
    saas
    b2b
    data-engineering
  • Artie
    Artie (s2023)Active • 4 employees • San Francisco, CA, USA
    Artie is software that streams data from databases to data warehouses in real-time. Today, most companies run their ETL process every few hours or overnight, so their data warehouse is always out of date; with Artie, the warehouse always has live production data.
    developer-tools
    saas
    open-source
    data-engineering
    enterprise-software
  • Egress
    Egress (s2023)Active • 2 employees • San Francisco, CA, USA
    Egress is the AI layer for company data. It allows anyone to transform and take action on data in their warehouse or database using natural language. For example, Egress has helped several companies identify high-propensity users from product data and convert them using personalized outreach campaigns.
    artificial-intelligence
    data-engineering
  • kater.ai
    kater.ai (w2024)Active • 2 employees • Los Angeles, CA, USA
    Kater is the self-learning data analyst that organizes, understands, and remembers all the nuances of your company's data estate. The more you use Kater, the better it gets. With a simple chat-like interface and an innovative data discovery catalog, Kater saves data teams 10-20 weekly hours of adhoc requests, and allows stakeholders to receive answers in 10 seconds rather than 10 days. Yvonne was a data engineer and analyst who built the entire data stack at CREXi. Robin led engineering in Microsoft Teams' development. Data is the new oil. Kater is forging a future where decision-makers can uncover valuable insights that may have been previously limited by the scope of specialized data teams. This is the future of data.
    artificial-intelligence
    analytics
    data-engineering
    ai-assistant
  • authzed
    authzed (w2021)Active • 12 employees • New York, NY, USA
    We build the tools companies need to provide performant and scalable authorization for their applications. We’re founded by 3 successful entrepreneurs with expertise in enterprise software, most recently as leaders at Red Hat. Jake and Joey met on the APIs team at Google in 2010. They went on to found Quay, where Jimmy joined as their first hire. Over the past decade, they’ve changed the landscape for building and deploying software.
    developer-tools
    saas
    security
    open-source
    data-engineering
  • Versori
    Versori (w2023)Active • 16 employees • Manchester, UK
    From operational automations to embeddable custom connectors. Save 10x of cost and time by building custom connectors and workflow automations in hours.
    saas
    b2b
    api
    no-code
    data-engineering
  • Lume
    Lume (w2023)Active • 3 employees • New York, NY, USA
    Lume automates data mappings using AI. Lume uses AI to automatically generate mapping logic to move data between any two schemas.
    generative-ai
    saas
    b2b
    data-engineering
    ai
  • Converge
    Converge (s2023)Active • 3 employees • San Francisco, CA, USA
    Tracking customer events (e.g. Add To Cart, Purchase, etc.) correctly is important, yet unattainable for most online stores due to the limitations of tracking in the browser and lack of in-house developers. Converge auto-tracks all important events – across the browser, store backend and subscription platforms. Once tracking is set up, Converge allows online stores to forward these events with the flip of a switch to their advertising platforms and analytics tools leading to improved ad performance and better insights. Our larger vision is to go beyond data infrastructure; and leverage our single customer data layer to build out a perfectly integrated set of applications that helps brands reduce their customer acquisition cost.
    saas
    analytics
    e-commerce
    data-engineering
    infrastructure
  • Acho
    Acho (w2020)Active • 15 employees • Boston, MA, USA
    Acho is a Data App Development Platform, powered by AI. This platform enables teams to transform business data into mission-critical applications used for automation, business intelligence, data science, internal tools, and customer-facing products. Today, Acho plays a pivotal role in elevating operational efficiency, automating workflows, and turning data into products for over 100 businesses. Among our valued customers are supply chain divisions of major global corporations, IT departments of Online Travel Agencies, Finance & Accounting units of prestigious banking institutions, and other organizations that play a key role in our daily life.
    saas
    data-engineering
    enterprise-software
    cloud-computing
    infrastructure
  • Streamdal
    Streamdal (s2020)Active • 9 employees • Portland, OR, USA
    SaaS data platform for observing, repairing and replaying data in streaming systems.
    developer-tools
    data-engineering
    devops
  • TetraScience
    TetraScience (s2015)Active • 100 employees • Boston, MA, USA
    TetraScience provides the world’s first and only R&D Data Cloud, with a mission to transform life sciences R&D, accelerate discovery, and improve human life. Scientists at global pharma and biotech organizations rely on our innovative Tetra Data Platform for easy access to centralized, harmonized, and actionable scientific data to accelerate their digital lab transformation. With best-in-class SaaS performance, a team of industry innovators, and excellent product/market fit, Tetra is positioned to become an iconic life sciences software company.
    saas
    data-engineering
  • Mozart Data
    Mozart Data (s2020)Active • 24 employees • San Francisco, CA, USA
    Mozart Data provides an out-of-the-box modern data stack that empowers anyone to easily consolidate, organize, and prepare their data for analysis. Spin up a data stack that’s built on a best-in-class data warehouse and ETL tool in hours, without any engineering. You can finally spend more time on generating insights and less time wrangling your data.
    saas
    b2b
    data-engineering
  • Patterns
    Patterns (s2021)Active • 2 employees • San Francisco, CA, USA
    Patterns enables everyone to analyze data, no matter their technical ability. No more waiting for reports from your data team or fiddling around with dashboards, simply make an analytics request, and get an AI generated answer from a fine-tuned bot on your company’s data.
    analytics
    data-science
    data-engineering
    data-visualization
  • Jitsu
    Jitsu (s2020)Active • 4 employees • San Francisco, CA, USA
    Jitsu is the fastest, most durable way to collect event data from every source - web, app, email, chatbot, CRM - into your data warehouse. 100% open-source. Purpose built, secure and ready in minutes.
    saas
    b2b
    open-source
    data-engineering
  • Datafold
    Datafold (s2020)Active • 24 employees • New York, NY, USA
    Datafold exists to make working with data more enjoyable and productive. We are all about empowering data and analytics engineers. We find the most tedious, error-prone, and repetitive tasks and create tools to automate them. We make the world better by giving superpowers to data professionals who solve hard problems in various domains with data.
    saas
    analytics
    data-engineering
  • Imbue (formerly Generally Intelligent)
    Imbue (formerly Generally Intelligent) (s2017)Active • 15 employees • San Francisco, CA, USA
    Imbue builds AI systems that reason and code, enabling AI agents to accomplish larger goals and safely work in the real world. We train our own foundation models optimized for reasoning and prototype agents on top of these models. By using these agents extensively, we gain insights into improving both the capabilities of the underlying models and the interaction design for agents. We aim to rekindle the dream of the *personal* computer, where computers become truly intelligent tools that empower us, giving us freedom, dignity, and agency to pursue the things we love.
    machine-learning
    data-engineering
    ai
  • Dataland
    Dataland (s2020)Active • 2 employees • New York, NY, USA
    Dataland lets internal teams search tables in Snowflake, BigQuery, and Postgres at extreme speed. Full-text search on billion-row tables finish within <1 second, if not <0.5s. It's 500x faster and cheaper than the status quo (e.g. Retool on Snowflake). Dataland comes with a beautifully designed UI. Any business user can get answers they need from massive datasets. Data engineers no longer have to build one-off, slow tools just for database lookups.
    saas
    b2b
    data-engineering
  • Polytomic
    Polytomic (w2020)Active • 7 employees • San Francisco, CA, USA
    Polytomic is a no-code web app to sync data between your internal databases, business systems (e.g. Stripe, Salesforce, etc), data warehouses, spreadsheets, and even HTTP APIs.
    saas
    b2b
    data-engineering
  • Etleap
    Etleap (w2013)Active • 11 employees • San Francisco, CA, USA
    Etleap is an ETL solution for creating perfect data pipelines from day one. Unlike other enterprise solutions, Etleap doesn’t require extensive engineering work to set up, maintain, and scale. It automates most ETL setup and maintenance work, and simplifies the rest into 10-minute tasks that analysts can own.
    data-engineering
  • Chaos Genius
    Chaos Genius (w2020)Active • 10 employees • San Francisco, CA, USA
    Chaos Genius is a DataOps Observability platform for Snowflake. Enable Snowflake Observability to reduce Snowflake costs and optimize query performance.
    cloud-workload-protection
    machine-learning
    analytics
    open-source
    data-engineering
  • Secoda
    Secoda (s2021)Active • 27 employees • Toronto, ON, Canada
    Secoda is a universal data discovery and documentation tool that makes finding metadata, queries, charts and documentation as easy as a google search. Today, data teams are collecting tons of data, but most employees don't know what data exists, how to use it, and what data to trust. This confusion happens because different components of company data get collected in fragmented tools Secoda helps teams find, understand data in one easy to use platform that's accessible to any employee.
    developer-tools
    saas
    b2b
    analytics
    data-engineering
  • Avenue
    Avenue (w2021)Active • 8 employees • New York, NY, USA
    Avenue is a simple way for business teams to set up alerts from their database or data warehouse. Think Datadog / PagerDuty for operations teams. Operations teams create set-and-forget alerts on all their data, so they can be more proactive with their time (and monitor on more nuanced triggers than just what fits on their dashboard page). Avenue can improve response times to critical problems from several days to real-time by alerting directly on the data sources that customers already use.
    developer-tools
    saas
    data-engineering
  • Prequel
    Prequel (w2021)Active • 9 employees • New York, NY, USA
    Prequel makes it easy for companies to share data with their customers. It helps you export data directly to your customer's Snowflake, Redshift, BigQuery, Databricks, or other data warehouse on an ongoing basis.
    saas
    analytics
    data-engineering
  • LaunchFlow
    LaunchFlow (w2023)Active • 2 employees
    LaunchFlow is the fastest way to build and deploy Python applications on the cloud. Our platform provides developers with the framework, tools, and infrastructure needed to build scalable, more reliable Python applications.
    developer-tools
    machine-learning
    b2b
    data-engineering
    cloud-computing
  • communion
    communion (s2019)Active • 8 employees • New York, NY, USA
    creative tools + powerful analytics
    artificial-intelligence
    marketing
    advertising
    data-engineering
    ai-assistant
  • Operator Labs
    Operator Labs (w2020)Active • 6 employees • New York, NY, USA
    Toolkit for connecting AI agents to the decentralized web
    generative-ai
    crypto-web3
    data-engineering
  • Logarithm Labs
    Logarithm Labs (w2020)Active • 2 employees • Foster City, CA, USA
    Easy button to use data for your daily operations. Power your business workflows with quality data. Logarithm Labs helps you turn manual data wrangling and ad-hoc scripts into repeatable pipelines for your operational workflows. Power your workflows with quality data. Our product and team of experts do the heavy lifting so that can focus on the business logic that drives your organization. To learn more, contact us at hello@logarithmlabs.com.
    developer-tools
    data-engineering
  • Outerbase
    Outerbase (w2023)Active • 4 employees • Pittsburgh, PA, USA
    Outerbase is the interface for your database. Companies use Outerbase to view, edit, and modify their data and even generate beautiful visual dashboards without having to write a single line of SQL.
    developer-tools
    generative-ai
    analytics
    data-engineering
    ai
  • TableFlow
    TableFlow (w2023)Active • 2 employees • San Francisco, CA, USA
    TableFlow is an open source data import platform for companies to collect and transform customer data. Instead of building an in-house file upload and processing service, businesses can embed or link to TableFlow's customizable importer to manage their data onboarding needs.
    artificial-intelligence
    developer-tools
    saas
    open-source
    data-engineering
  • Tarsal
    Tarsal (s2021)Active • 7 employees • New York, NY, USA
    Tarsal is data infrastructure for security teams. As security data grows 25% year over year, security teams desperately need access to best-in-class data infrastructure. Tarsal bridges the gap between the modern data stack and security teams, pioneering the modern security data stack.
    b2b
    cybersecurity
    big-data
    data-engineering
  • Cedalio
    Cedalio (s2023)Active • 6 employees • San Francisco, CA, USA
    With Cedalio developers can easily store data with the same scalability and developer experience of the traditional cloud, but with built in transparency, security and verifiability. Everything that happens on the database leaves an encrypted historical record of transactions on the blockchain that can not be tampered with.
    developer-tools
    climate
    supply-chain
    data-engineering
  • Evidence
    Evidence (s2021)Active • 6 employees • Toronto, ON, Canada
    Evidence is an open source, code-based alternative to drag-and-drop BI tools. Build polished data products with just SQL and markdown.
    developer-tools
    b2b
    analytics
    data-engineering
    data-visualization
  • Honeydew
    Honeydew (w2023)Active • 6 employees • Tel Aviv-Yafo, Israel
    The way people use data is constantly changing. Data teams must support every new context without breaking the shared truth. Honeydew’s semantic layer does it automatically. We validate each change and update every data flow. Using Honeydew, data teams can support 10x more data users - without more engineers or compromising integrity.
    saas
    b2b
    analytics
    data-engineering
  • Cargo
    Cargo (s2023)Active • 5 employees • San Francisco, CA, USA
    Cargo is the first revenue architecture built for modern teams. We help revenue teams to access their company data and automate their sales operations. We provide a headless interface to enable them to easily segment, score and route leads to turn pipeline into revenue.
    sales
    sales-enablement
    data-engineering
    infrastructure
    operations
  • Clear
    Clear (w2021)Active • 2 employees • London, UK
    Clear is the free mobile app that helps you track and share your skincare routine. We are fuelling innovation and empowering consumers in the skincare industry via data, technology and community. We were also the 2022 L'Oréal Beauty Tech for Good winners, and were featured under "Best New Apps and Updates" on the App Store in 2023. The skincare industry is worth $200B and social commerce is going to drive the future growth of every brand in the industry. We're going to be fuelling that growth.
    marketplace
    consumer
    digital-health
    data-engineering
  • Baselit
    Baselit (w2023)Active • 3 employees
    Govern and save up to 60% on Snowflake with zero effort, using AI agents.
    artificial-intelligence
    saas
    b2b
    big-data
    data-engineering
  • Taylor AI
    Taylor AI (s2023)Active • 2 employees • San Francisco, CA, USA
    Taylor AI is the data warehouse for your unstructured text. Instead of building brittle text pipelines or hiring MLEs to wrangle text, Data & Eng teams generate embeddings, cluster data, and SQL query text as easily as querying tabular data.
    artificial-intelligence
    developer-tools
    data-science
    data-engineering
    databases
  • HomeRoom
    HomeRoom (w2022)Active • 25 employees • San Jose, CA, USA
    Homeroom helps investors provide affordable housing while making a 22% ROI. We do this by sourcing properties, arranging capital, managing construction, vetting tenants and collecting rent by the room. To date, Homeroom has brought on 85 property investors, growing 6X annually, are bringing in 420K in annualized net-revenue How it works: We help investors buy homes in cities that are attractive to young people, but lack affordable housing options. We then renovate and after about 20 days, the home is ready and we find qualified renters by the room. We launched in 2018 in Kansas City with 1 home. We now have 105 homes in 31 cities. In 2021, we grew rental GMV to $1.8M (300% YoY growth). Our average rent across every property is $458, which is about 50% lower than market comps, and our investors see returns up to 50% higher. We are HomeRoom. Johnny is the financial analyst/domain expert. Thomas is a cereal entrepreneur with a PHD in ML, and Mike hacked growth for Airbnb and Facebook.
    machine-learning
    real-estate
    proptech
    nlp
    data-engineering
  • DAGWorks Inc.
    DAGWorks Inc. (w2023)Active • 2 employees • San Francisco, CA, USA
    At DAGWorks Inc. our goal is to change how data + ML + LLM teams are staffed and operate. We’re building an open core SaaS platform to streamline development and operation of data, ML, & LLM pipelines in a collaborative, self-service manner, utilizing a company's existing MLOps and data infrastructure. We believe self-service for Data Practitioners is the future because it enables domain modeling experts the velocity to iterate on pipelines & models without hand-off, which is key for businesses using ML/AI to differentiate themselves. Unless you’re a big tech company or someone like Stitch Fix that can afford a platform team, staffing teams with high ratios of engineers, or finding unicorn data scientists that can build pipelines is your only option; it not only slows time to value, it makes operating ML/AI expensive. We’re here to change that. Think simple python that enables a low software engineering bar to describe what should happen, and then with some extra metadata, generates the workflow code, and that also consolidates several MLOps tools into a single platform, all in a self-service manner. It’s functional and usable by junior and senior folks alike.
    developer-tools
    machine-learning
    b2b
    open-source
    data-engineering
  • Waydev
    Waydev (w2021)Active • 15 employees • San Francisco, CA, USA
    Leverage insights from your engineering stack to accelerate velocity, align engineering work to business priorities, and increase visibility into your team’s DORA Metrics and SPACE Framework Metrics
    analytics
    enterprise
    data-engineering
  • Lariat Data
    Lariat Data (s2021)Active • 3 employees • New York, NY, USA
    Lariat is a Continuous Data Quality monitoring platform to discover data bugs before your consumers do. Ensure data products don’t break even as business logic, input data and infrastructure change. Use Lariat to define and then automatically extract, store and visualize data quality metrics on raw event-level data through to delivered data products.
    machine-learning
    big-data
    data-engineering
  • CustomerOS
    CustomerOS (s2022)Active • 10 employees • London, UK
    The Top 10% of SaaS companies generate 87% of all market returns. CustomerOS gives you the data and tooling to compete with the top 10%. Specifically, we solve three major problems in B2B SaaS today: 1. CustomerOS is a system of record for all your customer data. We support 100+ integrations with any app or database that touches customer data. And there's no engineering required. 2. CustomerOS provides tooling for your in-life customer motion. We predict renewals (and churn), provide risk-weighted ARR forecasts, and manage all your Customer Success workflows, from onboarding to expansion to advocacy. 3. CustomerOS lead scores your pipeline against your ICP. We build data-driven profiles of your best customers and provide a real-time ICP-fit indicator on your sales and marketing pipeline. This ensures you're spending your CAC acquiring customers who are primed to renew year after year and expand as they grow.
    b2b
    customer-success
    open-source
    enterprise
    data-engineering
  • Whaly
    Whaly (s2021)Active • 3 employees • Paris, France
    Whaly helps data teams save time on maintenance and analysis building while making business users more autonomous on the analysis they want to improve their decision making. We do this by providing a self service data platform where both data and business teams can work together. We understood that most data teams were ending up being a bottleneck for the rest of the company and needed to give more autonomy to business teams to back their decisions with data. Emilien, Florian and Pierre were the minds behind the Data advertising platforms of the major media and e-commerce companies in France in their earlier position as Product Manager and head of Customer Success, giving them an edge on how to execute successfully a data project.
    data-engineering
  • Quary
    Quary (w2024)Active • 2 employees • London, UK
    artificial-intelligence
    analytics
    data-science
    data-engineering
    ai
  • PeerDB
    PeerDB (s2023)Active • 2 employees
    At PeerDB, we are building a fast, simple and the most cost effective way to stream data from Postgres to Data Warehouses, Queues and Storage engines. If you are running Postgres at the heart of your data-stack and move data at scale from Postgres to any of the above targets, PeerDB can provide value. We support different modes of streaming - log based (CDC), cursor based (timestamp or integer) and XMIN based. Performance wise, we are 10x faster than existing tools. Features wise, we support native Postgres features such as comprehensive set of data-types incl. jsonb/arrays/postgis, efficiently streaming toast columns, schema changes and so on.
    developer-tools
    open-source
    data-engineering
    enterprise-software
    databases
  • Roe AI
    Roe AI (w2024)Active • 2 employees • San Francisco, CA, USA
    Roe AI is the next-generation data warehouse that unifies unstructured data and structured data processing. Our mission is to help enterprises extract intelligence from all kinds of data at scale to power their business priorities. Data lies at the heart of strategic decision-making, steering enterprises toward their KPIs, and Roe AI accelerates these successes by providing intuitive and intelligent multi-modal data standardization, data classification & inferencing, multi-modal searching and data aggregation. Book us a call to discover how Roe AI can take your enterprise's data intelligence to the next level https://calendly.com/roe-ai/intro The future of data science is here.
    data-science
    data-engineering
    ai
    databases
  • InQuery
    InQuery (w2024)Active • 2 employees
    InQuery is a flexible platform for centralizing and powering your data workloads. Built on top of open projects like Apache Iceberg, Spark, and Trino, InQuery provides an end-to-end lake house solution to simplify your data operations without breaking the bank. With InQuery, you can scale your data superpowers without scaling your data teams.
    team-collaboration
    big-data
    data-engineering
  • OmniAI
    OmniAI (w2024)Active • 2 employees • New York, NY, USA
    OmniAI provides a foundational data infrastructure layer for AI-driven applications. Search and derive instant benefits from unstructured data across your entire data architecture. • No-code connectors to ingest data from any source into a central warehouse (Postgres, MongoDB, Google Drive) • Transform unstructured data into organized, structured formats • Merge semantic search capabilities with conventional search and ranking methods to enhance RAG applications
    big-data
    data-engineering
    ai
  • Preloop
    Preloop (w2024)Active • 2 employees
    Preloop enables ML teams to deploy models in hours instead of weeks. We do this by scanning your existing training code and automatically provisioning and scaling pipelines and your inference endpoint using proprietary algorithms. We also provide rich observability to make it easy to monitor and manage models once they are deployed. We also offer versioning out of the box. With Preloop, science teams can move quickly and focus more on science and less on deployments.
    artificial-intelligence
    api
    data-science
    data-engineering
    enterprise-software
  • Toolchest
    Toolchest (w2022)Active • 3 employees • Mountain View, CA, USA
    Toolchest makes it easy for bioinformaticians to run popular computational biology software in the cloud. Drug discovery companies use Toolchest to get analysis results up to 100x faster. We have Python and R libraries that customers use to run popular open-source tools at scale in the cloud. Toolchest is used wherever their analysis currently exists – e.g. a Jupyter notebook on their laptop, an R script on an on-prem cluster, or a Python script in the cloud.
    developer-tools
    drug-discovery
    data-engineering
  • Sarus
    Sarus (w2022)Active • 16 employees • Paris, France
    Sarus solves the problem of accessing or sharing personal data for analytics or machine learning. The solution deploys natively in data infrastructures and lets practitioners work on data they cannot see. Every interaction with the sensitive data is protected with the highest privacy standard: differential privacy Sarus makes traditional anonymization methods irrelevant, saving months in compliance and data engineering while preserving all of the value of data.
    analytics
    compliance
    data-engineering
  • DynamoFL
    DynamoFL (w2022)Active • 30 employees • San Francisco, CA, USA
    DynamoFL is the most private solution for enterprise AI. Achieve best-in-class and compliant AI at the fraction of the time and cost.
    machine-learning
    privacy
    data-engineering
  • Elementary
    Elementary (w2022)Active • 12 employees • Tel Aviv-Yafo, Israel
    Elementary enables data teams to detect problems in their data before their users do. An open-source solution that any data engineer can deploy in minutes without sharing sensitive data.
    developer-tools
    analytics
    open-source
    data-engineering
  • LanceDB
    LanceDB (w2022)Active • 4 employees • San Francisco, CA, USA
    LanceDB is a new open-source vector database that can support low-latency billion-scale vector search on a single node. Built around a new columnar data format, LanceDB makes it incredibly easy to build applications for generative AI, recsys, search engines, content moderation, and more.
    aiops
    machine-learning
    data-engineering
  • Hydra
    Hydra (w2022)Active • 6 employees • San Francisco, CA, USA
    Open source Snowflake alternative. Query billions of rows instantly on column-oriented Postgres. Hydra can be used as open source, managed cloud, or deployable in customer cloud infrastructure. Get parallelized analytics in minutes with no code changes
    developer-tools
    analytics
    open-source
    data-engineering
  • Trackingplan
    Trackingplan (w2022)Active • 8 employees • Barcelona, Spain
    Trackingplan automatically discovers and monitors all the information your applications and websites are collecting, ensuring that you can trust your BI, analytics, marketing, and sales tools. You can think of us as Segment Protocols but totally transparent, where developers can keep using Google Analytics, Amplitude, Hubspot, Intercom, Braze, etc. as they are used to. Installed in minutes in using your Tag Manager or adding just one line of code to your web or apps, we model all the data being sent to third parties. Since Trackingplan understands what each piece of data means, it identifies patterns, detects anomalies, and automatically connects the dots to create value from data that was hidden in plain sight: - An always up-to-date single source of truth and data governance tool. To discover, understand and document your data and improve communication across teams. - Automated notifications when something breaks or changes. To make sure that integrations are always well implemented: Schema errors, traffic anomalies, rogue events... - Easy to understand, customizable, cross-service alerts. To detect trends, insights, and problems without using complex, engineer-oriented solutions.
    saas
    analytics
    data-engineering
  • Bracket
    Bracket (w2022)Active • 3 employees • New York, NY, USA
    Bracket is the two-way data pipeline between popular business tools and backend databases. When ops teams update data in Salesforce or Airtable, and engineers update data in the database, Bracket connects the two sources to reflect the same information.
    saas
    b2b
    data-engineering
  • Grai
    Grai (s2022)Active • 3 employees • San Francisco, CA, USA
    Grai is open source version control for metadata. We can determine how database changes will affect deployed machine learning models, apis, and dashboards because we understand how data relates across systems which don’t otherwise talk to each other.
    developer-tools
    saas
    analytics
    open-source
    data-engineering
  • Lamin
    Lamin (s2022)Active • 4 employees • Munich, Germany
    Manage data & analyses with an open-source Python framework. Collaborate across dry and wetlab in a distributed data hub. Get started on your laptop and deploy anywhere.
    developer-tools
    machine-learning
    biotech
    open-source
    data-engineering
  • Findly
    Findly (s2022)Active • 6 employees • London, UK
    Findly.ai is the ChatGPT for Google Analytics that revolutionizes how businesses understand and interact with their data. By creating an engaging chat environment, it empowers decision-makers to gain insights, request reports, and generate visualizations based on their company's metrics. This seamless interaction is made possible by integrating a metric layer that comprehends all your company's metrics. The chat-based exploration simplifies complex data analysis, allowing users to generate comprehensive summaries with a single click, which can be exported to various formats. Furthermore, with the introduction of scheduled chats and action-triggered automations, Findly.ai enhances the autonomy and efficiency of decision-makers. It's more than a tool; it's a decision-making operational system aiming to facilitate decision-makers in achieving their KPIs while spending less time waiting for data.
    generative-ai
    b2b
    chatbot
    data-engineering
    ai
  • Sunpia
    Sunpia (s2022)Active • 3 employees • San Jose, CA, USA
    Sunpia lets developers easily experience the cost and speed benefits of serverless infrastructure, without having to rewrite their code. Developers annotate their code and Sunpia automatically designs a microservice version of it they can deploy on their own cloud.
    developer-tools
    kubernetes
    data-engineering
  • MovingLake
    MovingLake (s2022)Active • 3 employees • Mexico City, CDMX, Mexico
    MovingLake is Fivetran for event-driven architectures. Companies such as Casai use our product to obtain orders and price changes in real time.
    saas
    b2b
    analytics
    api
    data-engineering
  • Yhat (w2015)Acquired • 17 employees • Brooklyn, NY, USA
    Yhat (YC W15, pronounced y-hat) was an end-to-end data science platform. Acquired by Alteryx (NYSE:AYX)
    artificial-intelligence
    machine-learning
    enterprise
    data-engineering
  • Data Mechanics
    Data Mechanics (s2019)Acquired • 25 employees • Paris, France
    Data Mechanics was acquired by NetApp in 2021 and integrated in the Spot.io product portfolio. Our managed Spark-on-Kubernetes platform is live and running under the name Ocean for Apache Spark: https://spot.io/products/ocean-apache-spark/
    saas
    b2b
    open-source
    data-engineering
  • Stackshine
    Stackshine (w2022)Acquired • 7 employees • Portland, OR, USA
    Stackshine is creating mission control for enterprise IT teams. We discover all the software being used across their organization and then automate workflows related to onboarding/offboarding, cost savings, and security.
    robotic-process-automation
    productivity
    analytics
    enterprise
    data-engineering
  • Satsuma
    Satsuma (s2021)Acquired • 5 employees • San Francisco, CA, USA
    Satsuma is a developer tool for building applications on top of real-time blockchain data. Our product lets developers take decoded data from multiple chains, customize it for their use cases, and access it through API endpoints. Blockchains serve as distributed databases for these products, holding their most important data. However, it’s difficult to access and query that data. We believe this friction is an enormous blocker for web3 developers and that better tooling will enable mass adoption for web3. We’re a founding team of engineers, having built data infrastructure and product as early employees at Airtable, Heap, and Y Combinator.
    developer-tools
    saas
    crypto-web3
    data-engineering