Data Engineering Startups funded by Y Combinator (YC) 2024

April 2024

Browse 84 of the top Data Engineering startups funded by Y Combinator.

We also have a Startup Directory where you can search through over 5,000 companies.

  • Fivetran
    Fivetran (w2013)Active • 1,200 employees • Oakland, CA, USA
    Fivetran automates data movement out of, into and across cloud data platforms. We automate the most time-consuming parts of the ELT process from extracts to schema drift handling to transformations, so data engineers can focus on higher-impact projects with total pipeline peace of mind. With 99.9% uptime and self-healing pipelines, Fivetran enables hundreds of leading brands across the globe, including Autodesk, Conagra Brands, JetBlue, Lionsgate, Morgan Stanley, and Ziff Davis, to accelerate data-driven decisions and drive business growth. Fivetran is headquartered in Oakland, California, with offices around the world. 
    saas
    b2b
    analytics
    data-engineering
  • Airbyte
    Airbyte (w2020)Active • 110 employees • San Francisco, CA, USA
    Airbyte is the leading open-source ELT platform that replicates data from applications, APIs & databases to data warehouses, data lakes, and other destinations. https://github.com/airbytehq/airbyte
    developer-tools
    open-source
    data-engineering
  • Supabase
    Supabase (s2020)Active • 70 employees
    Supabase is the easiest way to get started with Postgres. Each project within Supabase is an isolated Postgres cluster, allowing customers to scale independently, while still providing the features that you need to build: instant database setup, auth, row level security, realtime data streams, auto-generating APIs, and a simple to use web interface. We are 100% remote.
    developer-tools
    open-source
    big-data
    data-engineering
    databases
  • TRM Labs
    TRM Labs (s2019)Active • 180 employees • San Francisco, CA, USA
    At TRM, we're on a mission to build trust in digital assets, because the promise of crypto is too valuable to be impeded by bad actors. We provide a blockchain intelligence platform to law enforcement, financial institutions, and crypto firms to assist in the detection and prevention of cryptocurrency fraud and financial crime. Our vision is to build a company that can sustainably deliver on our mission for decades to come, enabling consumers to transact safely and securely on the blockchain. Join our mission ➔ www.trmlabs.com/careers
    fintech
    machine-learning
    crypto-web3
    data-engineering
  • Gecko Robotics
    Gecko Robotics (w2016)Active • 230 employees • Austin, TX, USA
    The mission of Gecko Robotics is to improve the state of the world by helping the most important institutions ensure the availability, reliability and sustainability of critical infrastructure. Gecko's combination of wall-climbing robots, industry-leading sensors, and an AI-powered data platform give customers a unique window into the health of their physical assets allowing real-time decisions that prevent power outages, ensure military missions succeed, and help reduce energy costs.
    robotics
    energy
    big-data
    data-engineering
    ai
  • Mezmo
    Mezmo (w2015)Active • 172 employees • San Jose, CA, USA
    Mezmo, formerly LogDNA, is an observability platform to manage and take action on your data. It ingests, processes, and routes log data to fuel enterprise-level application development and delivery, security, and compliance use cases. Mezmo was brought to life by three-time co-founders Chris Nguyen and Lee Liu and included in the Winter 2015 batch of Y Combinator. In 2018 the company partnered with tech giant, IBM, to become the sole logging provider for IBM Cloud. Mezmo is on a mission to empower people who build solutions that shape the world. We’re doing this by delivering a platform that enables enterprises to get more value from their observability data in real time, regardless of source, destination, use case, or scale. We’re not the only ones working on this problem but we have a few things the others don’t. We’re cloud-native and know how to make the most of modern technology like Kubernetes. We have scaled a solution from zero to petabyte scale in a short amount of time, while supporting thousands of active users across multiple environments. We are hungry for change and are surrounded by enterprises telling us they’re hungry, too. We have a kick-ass group of people who are thinking about the problem analytically and are excited to change the observability world for the better. Mezmo has helped some of the world’s most innovative companies transform how they manage their systems and applications. Still, we know that we can help them get more value from their observability data by providing more flexibility and control over how they use it. This will enable teams to spend less time switching between data silos so they can focus on shipping better, more resilient, and secure products. We have momentum on our side. Last year we saw triple digit revenue growth and added 800 new customers to our roster. Recent accolades include being named to YC’s Top Companies, CRN’s 10 Hottest DevOps Startups, and EMA’s Top 3 Observability Platforms.
    developer-tools
    devsecops
    saas
    kubernetes
    data-engineering
  • Spruce Systems
    Spruce Systems (w2021)Active • 25 employees • New York, NY, USA
    Spruce lets users control their data across the web. We believe that the world is evolving toward one based on cryptography, networks, and digital economies that are user-controlled. Today, the dominant use case for user keys is the signing of blockchain transactions, but we think this barely scratches the surface of what is possible. Soon, the entirety of a user’s digital interactions will be based on their keypairs, and we’re unlocking this transition with our constellation of products. We are passionate about cultivating a thriving culture of diverse individuals who bring unique perspectives to our mission. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.
    crypto-web3
    identity
    open-source
    privacy
    data-engineering
  • Narrator
    Narrator (s2019)Active • 8 employees • New York, NY, USA
    Narrator is an end-to-end platform built on top of the data standard, the Activity Schema and starting at $500/mo. Data analyst are able to build their definitions of their user journey, and use that journey to answer any question that comes up. From there, data can be visualized in a dashboard, used to build a story like analysis, exported and more. The biggest values of Narrator is Speed and Cost reduction. Small teams are able to move fast and answer questions in minutes allowing them to preform the work of very large data teams. All while Narrator is optimized to minimize compute cost of the warehouse.
    analytics
    big-data
    data-engineering
  • Stacksync
    Stacksync (w2024)Active • 4 employees • San Francisco, CA, USA
    Stacksync powers real-time and bidirectional data synchronization between CRMs (e.g. Salesforce, Hubspot or SAP) and databases (e.g. Postgres, Google BigQuery,...). Edits made in your CRM will instantly update in your Database, and vice-versa. To set up a sync, users simply have to connect the two chosen apps in one click and select the tables they want to sync, no-code! Stacksync reduces implementation delays from months to minutes for CRM integration projects and removes all the complexity behind CRM new feature development. We show a 90% improvement on delivery time and budget.
    b2b
    api
    crm
    data-engineering
    databases
  • Titan
    Titan (w2024)Active • 1 employees • San Francisco, CA, USA
    Titan is an open-source toolkit for data compliance. Data engineering teams trust Titan to simplify access management, ensure compliance, and minimize risk.
    developer-tools
    open-source
    cybersecurity
    data-engineering
  • Metaplane
    Metaplane (w2020)Active • 12 employees • Boston, MA, USA
    Metaplane ensures everyone trusts the data that powers your business. Data teams at Bose, Ramp, and Klaviyo use our data observability platform to prevent and detect data issues — before the CEO pings them about weird revenue numbers. We do this with ML-based anomaly detection, end-to-end column-level lineage, and tools to help prevent incidents before they occur. You can monitor your entire data stack within 30 minutes. The company is backed by Khosla Ventures, Y Combinator, and the founders of Okta, HubSpot, and Vercel.
    developer-tools
    saas
    data-engineering
  • DataShare
    DataShare (s2023)Active • 1 employees • Austin, TX, USA
    DataShare is a data-as-a-service platform that lets you embed charts, dashboards and exports directly into your product. For example, if you run an accounting startup, DataShare would enable you to embed a full profit and loss dashboard, with downloadable statements. DataShare is backed by an enterprise-grade data warehouse, and can be implemented in fewer than 20 lines of code.
    analytics
    data-engineering
    databases
  • Serra
    Serra (s2023)Active • 2 employees • San Francisco, CA, USA
    Serra is Tableau for data infrastructure. Serra enables smaller, less-technical teams to build cloud data infrastructure—batch and real-time data pipelines, rapid SQL analytics, and scalable data science and ML—through a user-friendly dashboard.
    developer-tools
    big-data
    data-engineering
    ai
  • Briefer
    Briefer (s2023)Active • 2 employees • São Paulo, State of São Paulo, Brazil
    Briefer helps data scientists and analysts build interactive visualizations and data apps using a Notion-like interface. Connect to your data sources, write SQL or Python, collaborate through comments and multiplayer editing, and run code in whichever compute environments you need.
    developer-tools
    b2b
    data-science
    data-engineering
    data-visualization
  • authzed
    authzed (w2021)Active • 17 employees • New York, NY, USA
    We build the tools companies need to provide performant and scalable authorization for their applications. We’re founded by 3 successful entrepreneurs with expertise in enterprise software, most recently as leaders at Red Hat. Jake and Joey met on the APIs team at Google in 2010. They went on to found Quay, where Jimmy joined as their first hire. Over the past decade, they’ve changed the landscape for building and deploying software.
    developer-tools
    saas
    security
    open-source
    data-engineering
  • OneSchema
    OneSchema (s2021)Active • 10 employees • San Francisco, CA, USA
    Product and engineering teams use OneSchema to save months of development time to build a CSV importer. OneSchema improves customer activation / import completion rates by automatically correcting customer data.
    developer-tools
    saas
    b2b
    data-engineering
  • Artie
    Artie (s2023)Active • 4 employees • San Francisco, CA, USA
    Artie is software that streams data from databases to data warehouses in real-time. Today, most companies run their ETL process every few hours or overnight, so their data warehouse is always out of date; with Artie, the warehouse always has live production data.
    developer-tools
    saas
    open-source
    data-engineering
    enterprise-software
  • Evidence
    Evidence (s2021)Active • 6 employees • Toronto, ON, Canada
    Evidence is an open source, code-based alternative to drag-and-drop BI tools. Build polished data products with just SQL and markdown.
    developer-tools
    b2b
    analytics
    data-engineering
    data-visualization
  • Egress
    Egress (s2023)Active • 2 employees • San Francisco, CA, USA
    Egress is the AI layer for company data. It allows anyone to transform and take action on data in their warehouse or database using natural language. For example, Egress has helped several companies identify high-propensity users from product data and convert them using personalized outreach campaigns.
    artificial-intelligence
    data-engineering
  • kater.ai
    kater.ai (w2024)Active • 2 employees • San Francisco, CA, USA
    Kater makes it possible for executives to understand why business outcomes occur in a couple seconds. Kater generates informed hypotheses, validates those hypotheses through code, then surfaces the insights to the user to then make a final decision. Yvonne was a data engineer and analyst who built the entire data stack at CREXi. Robin led engineering in Microsoft. Data is the new oil. Kater is forging a future where decision-makers can uncover valuable insights that may have been previously limited by the scope of specialized data teams. This is the future of data.
    artificial-intelligence
    analytics
    data-engineering
  • Versori
    Versori (w2023)Active • 16 employees • Manchester, UK
    From operational automations to embeddable custom connectors. Save 10x of cost and time by building custom connectors and workflow automations in hours.
    saas
    b2b
    api
    no-code
    data-engineering
  • Lume
    Lume (w2023)Active • 3 employees • New York, NY, USA
    Lume automates data mappings using AI. Lume uses AI to automatically generate mapping logic to move data between any two schemas.
    generative-ai
    saas
    b2b
    data-engineering
    ai
  • CambioML
    CambioML (s2023)Active • 3 employees • San Jose, CA, USA
    CambioML providing ML tools for extracting and reconstruct text and data from PDFs, HTMLs and forms. Join the enterprise data gold mining from your legacy docs.
    fintech
    saas
    open-source
    enterprise
    data-engineering
  • Converge
    Converge (s2023)Active • 3 employees • San Francisco, CA, USA
    Tracking customer events (e.g. Add To Cart, Purchase, etc.) correctly is important, yet unattainable for most online stores due to the limitations of tracking in the browser and lack of in-house developers. Converge auto-tracks all important events – across the browser, store backend and subscription platforms. Once tracking is set up, Converge allows online stores to forward these events with the flip of a switch to their advertising platforms and analytics tools leading to improved ad performance and better insights. Our larger vision is to go beyond data infrastructure; and leverage our single customer data layer to build out a perfectly integrated set of applications that helps brands reduce their customer acquisition cost.
    saas
    analytics
    e-commerce
    data-engineering
    infrastructure
  • Acho
    Acho (w2020)Active • 15 employees • Boston, MA, USA
    Acho is a Data App Development Platform, powered by AI. This platform enables teams to transform business data into mission-critical applications used for automation, business intelligence, data science, internal tools, and customer-facing products. Today, Acho plays a pivotal role in elevating operational efficiency, automating workflows, and turning data into products for over 100 businesses. Among our valued customers are supply chain divisions of major global corporations, IT departments of Online Travel Agencies, Finance & Accounting units of prestigious banking institutions, and other organizations that play a key role in our daily life.
    saas
    data-engineering
    enterprise-software
    cloud-computing
    infrastructure
  • Streamdal
    Streamdal (s2020)Active • 9 employees • Portland, OR, USA
    SaaS data platform for observing, repairing and replaying data in streaming systems.
    developer-tools
    data-engineering
    devops
  • TetraScience
    TetraScience (s2015)Active • 100 employees • Boston, MA, USA
    TetraScience provides the world’s first and only R&D Data Cloud, with a mission to transform life sciences R&D, accelerate discovery, and improve human life. Scientists at global pharma and biotech organizations rely on our innovative Tetra Data Platform for easy access to centralized, harmonized, and actionable scientific data to accelerate their digital lab transformation. With best-in-class SaaS performance, a team of industry innovators, and excellent product/market fit, Tetra is positioned to become an iconic life sciences software company.
    saas
    data-engineering
  • Platypus
    Platypus (w2021)Active • 3 employees • San Francisco, CA, USA
    For Business Operators: Connect & automate processes on top of any data, crazy fast. For Engineering Teams: Connect any data, across any stack, in any format, crazy fast.
    b2b
    workflow-automation
    data-engineering
    ai-assistant
  • Mozart Data
    Mozart Data (s2020)Active • 24 employees • San Francisco, CA, USA
    Mozart Data provides an out-of-the-box modern data stack that empowers anyone to easily consolidate, organize, and prepare their data for analysis. Spin up a data stack that’s built on a best-in-class data warehouse and ETL tool in hours, without any engineering. You can finally spend more time on generating insights and less time wrangling your data.
    saas
    b2b
    data-engineering
  • Patterns
    Patterns (s2021)Active • 2 employees • San Francisco, CA, USA
    Patterns enables everyone to analyze data, no matter their technical ability. No more waiting for reports from your data team or fiddling around with dashboards, simply make an analytics request, and get an AI generated answer from a fine-tuned bot on your company’s data.
    analytics
    data-science
    data-engineering
    data-visualization
  • Jitsu
    Jitsu (s2020)Active • 4 employees • San Francisco, CA, USA
    Jitsu is the fastest, most durable way to collect event data from every source - web, app, email, chatbot, CRM - into your data warehouse. 100% open-source. Purpose built, secure and ready in minutes.
    saas
    b2b
    open-source
    data-engineering
  • Datafold
    Datafold (s2020)Active • 24 employees • New York, NY, USA
    Datafold exists to make working with data more enjoyable and productive. We are all about empowering data and analytics engineers. We find the most tedious, error-prone, and repetitive tasks and create tools to automate them. We make the world better by giving superpowers to data professionals who solve hard problems in various domains with data.
    saas
    analytics
    data-engineering
  • Imbue (formerly Generally Intelligent)
    Imbue (formerly Generally Intelligent) (s2017)Active • 15 employees • San Francisco, CA, USA
    Imbue builds AI systems that reason and code, enabling AI agents to accomplish larger goals and safely work in the real world. We train our own foundation models optimized for reasoning and prototype agents on top of these models. By using these agents extensively, we gain insights into improving both the capabilities of the underlying models and the interaction design for agents. We aim to rekindle the dream of the *personal* computer, where computers become truly intelligent tools that empower us, giving us freedom, dignity, and agency to pursue the things we love.
    machine-learning
    data-engineering
    ai
  • Dataland
    Dataland (s2020)Active • 2 employees • New York, NY, USA
    Dataland is the easiest way to deliver high-quality internal tools to your business users. It's secure, easy-to-use, and sets up in minutes. Dataland uses GenAI to enable business users to construct their own internal tools without blocking on engineering.
    saas
    b2b
    data-engineering
  • HomeRoom
    HomeRoom (w2022)Active • 25 employees • San Jose, CA, USA
    Homeroom helps investors provide affordable housing while making a 22% ROI. We do this by sourcing properties, arranging capital, managing construction, vetting tenants and collecting rent by the room. To date, Homeroom has brought on 85 property investors, growing 6X annually, are bringing in 420K in annualized net-revenue How it works: We help investors buy homes in cities that are attractive to young people, but lack affordable housing options. We then renovate and after about 20 days, the home is ready and we find qualified renters by the room. We launched in 2018 in Kansas City with 1 home. We now have 105 homes in 31 cities. In 2021, we grew rental GMV to $1.8M (300% YoY growth). Our average rent across every property is $458, which is about 50% lower than market comps, and our investors see returns up to 50% higher. We are HomeRoom. Johnny is the financial analyst/domain expert. Thomas is a cereal entrepreneur with a PHD in ML, and Mike hacked growth for Airbnb and Facebook.
    machine-learning
    real-estate
    proptech
    nlp
    data-engineering
  • Polytomic
    Polytomic (w2020)Active • 7 employees • San Francisco, CA, USA
    Polytomic is a no-code web app to sync data between your internal databases, business systems (e.g. Stripe, Salesforce, etc), data warehouses, spreadsheets, and even HTTP APIs.
    saas
    b2b
    data-engineering
  • FlowDeploy
    FlowDeploy (w2022)Active • 3 employees • Mountain View, CA, USA
    FlowDeploy helps bioinformaticians manage their data analysis pipelines. We provide everything they need to try, run, develop, and share their pipelines. That includes integrations with AWS, Snakemake, Nextflow, GitHub, Slack, SSO, and more, as well as a clean API and web app for launching and monitoring pipelines and managing their data. FlowDeploy is built for bioinformaticians: it doesn't restrict how pipelines are built and managed, as long as a bioinformatics workflow manager like Nextflow or Snakemake is used. But it does eliminate several footguns like idle spend and accidental data egress, and it reduces the potential for users accidentally sharing credentials. FlowDeploy runs the pipelines in either our managed cloud or the customer's cloud – eliminating the need to transfer data externally. Non-computational biologists can use FlowDeploy, too: features like pipelines templates decrease the complexity to launch a new pipeline, which reduces user error and decreases the need for advanced cloud training for non-computational users.
    developer-tools
    drug-discovery
    data-engineering
  • Etleap
    Etleap (w2013)Active • 11 employees • San Francisco, CA, USA
    Etleap is an ETL solution for creating perfect data pipelines from day one. Unlike other enterprise solutions, Etleap doesn’t require extensive engineering work to set up, maintain, and scale. It automates most ETL setup and maintenance work, and simplifies the rest into 10-minute tasks that analysts can own.
    data-engineering
  • Secoda
    Secoda (s2021)Active • 27 employees • Toronto, ON, Canada
    Secoda is a universal data discovery and documentation tool that makes finding metadata, queries, charts and documentation as easy as a google search. Today, data teams are collecting tons of data, but most employees don't know what data exists, how to use it, and what data to trust. This confusion happens because different components of company data get collected in fragmented tools Secoda helps teams find, understand data in one easy to use platform that's accessible to any employee.
    developer-tools
    saas
    b2b
    analytics
    data-engineering
  • Chaos Genius
    Chaos Genius (w2020)Active • 10 employees • San Francisco, CA, USA
    Chaos Genius is a DataOps Observability platform for Snowflake. Enable Snowflake Observability to reduce Snowflake costs and optimize query performance.
    cloud-workload-protection
    machine-learning
    analytics
    open-source
    data-engineering
  • Avenue
    Avenue (w2021)Active • 8 employees • New York, NY, USA
    Avenue is a simple way for business teams to set up alerts from their database or data warehouse. Think Datadog / PagerDuty for operations teams. Operations teams create set-and-forget alerts on all their data, so they can be more proactive with their time (and monitor on more nuanced triggers than just what fits on their dashboard page). Avenue can improve response times to critical problems from several days to real-time by alerting directly on the data sources that customers already use.
    developer-tools
    saas
    data-engineering
  • Prequel
    Prequel (w2021)Active • 9 employees • New York, NY, USA
    Prequel makes it easy for companies to share data with their customers. It helps you export data directly to your customer's Snowflake, Redshift, BigQuery, Databricks, or other data warehouse on an ongoing basis.
    saas
    analytics
    data-engineering
  • LaunchFlow
    LaunchFlow (w2023)Active • 3 employees
    LaunchFlow is the easiest way to build and deploy Python apps on GCP and AWS. LaunchFlow manages cloud environments in your own GCP / AWS account that are secure, scalable, and cost effective by default. Import Postgres, Redis, and other cloud resources in your Python code, then deploy everything to your cloud with a single command.
    developer-tools
    machine-learning
    b2b
    data-engineering
    cloud-computing
  • communion
    communion (s2019)Active • 8 employees • New York, NY, USA
    creative tools + powerful analytics
    artificial-intelligence
    marketing
    advertising
    data-engineering
    ai-assistant
  • Operator Labs
    Operator Labs (w2020)Active • 6 employees • New York, NY, USA
    Toolkit for connecting AI agents to the decentralized web
    generative-ai
    crypto-web3
    data-engineering
  • Logarithm Labs
    Logarithm Labs (w2020)Active • 2 employees • Foster City, CA, USA
    Easy button to use data for your daily operations. Power your business workflows with quality data. Logarithm Labs helps you turn manual data wrangling and ad-hoc scripts into repeatable pipelines for your operational workflows. Power your workflows with quality data. Our product and team of experts do the heavy lifting so that can focus on the business logic that drives your organization. To learn more, contact us at hello@logarithmlabs.com.
    developer-tools
    data-engineering
  • Outerbase
    Outerbase (w2023)Active • 4 employees • Pittsburgh, PA, USA
    Outerbase is the interface for your database. Companies use Outerbase to view, edit, and modify their data and even generate beautiful visual dashboards without having to write a single line of SQL.
    developer-tools
    generative-ai
    analytics
    data-engineering
    ai
  • TableFlow
    TableFlow (w2023)Active • 2 employees • San Francisco, CA, USA
    TableFlow is an open source data import platform for companies to collect and transform customer data. Instead of building an in-house file upload and processing service, businesses can embed or link to TableFlow's customizable importer to manage their data onboarding needs.
    artificial-intelligence
    developer-tools
    saas
    open-source
    data-engineering
  • Tarsal
    Tarsal (s2021)Active • 10 employees • New York, NY, USA
    Tarsal is a data pipeline custom built for security teams. As security data grows 25% year over year, security teams desperately need access to best-in-class data infrastructure. Tarsal bridges the gap between the modern data stack and security teams, pioneering the modern security data stack.
    b2b
    cybersecurity
    big-data
    data-engineering
  • Honeydew
    Honeydew (w2023)Active • 6 employees • Tel Aviv-Yafo, Israel
    The way people use data is constantly changing. Data teams must support every new context without breaking the shared truth. Honeydew’s semantic layer does it automatically. We validate each change and update every data flow. Using Honeydew, data teams can support 10x more data users - without more engineers or compromising integrity.
    saas
    b2b
    analytics
    data-engineering
  • Cargo
    Cargo (s2023)Active • 5 employees • San Francisco, CA, USA
    Cargo is the first revenue architecture built for modern teams. We help revenue teams to access their company data and automate their sales operations. We provide a headless interface to enable them to easily segment, score and route leads to turn pipeline into revenue.
    sales
    sales-enablement
    data-engineering
    infrastructure
    operations
  • Clear
    Clear (w2021)Active • 2 employees • London, UK
    Clear is the free mobile app that helps you track and share your skincare routine. We are fuelling innovation and empowering consumers in the skincare industry via data, technology and community. We were also the 2022 L'Oréal Beauty Tech for Good winners, and were featured under "Best New Apps and Updates" on the App Store in 2023. The skincare industry is worth $200B and social commerce is going to drive the future growth of every brand in the industry. We're going to be fuelling that growth.
    marketplace
    consumer
    digital-health
    data-engineering
  • Baselit
    Baselit (w2023)Active • 3 employees
    Baselit automatically optimizes your Snowflake warehouses and achieves 20-40% cost reduction with zero effort. Never worry about budget overruns again.
    saas
    b2b
    big-data
    data-engineering
    ai
  • Taylor AI
    Taylor AI (s2023)Active • 2 employees • San Francisco, CA, USA
    Taylor is the API for classifying unstructured text in real-time. Developers use Taylor's API to label and build products around their free text without the infrastructure and maintenance overhead. Taylor has helped engineering teams efficiently tag large-scale text data (like user-generated content or web-scraped data). Why Taylor? Our customers previously relied on LLMs for tagging text. They soon hit rate limits, high latency, and accuracy issues (LLMs are optimized for generative tasks, not structured classification). With Taylor, engineering teams get access to our proprietary models purpose-built for text classification. We have no rate limits, guaranteed latency, and straightforward pricing.
    developer-tools
    data-science
    data-labeling
    data-engineering
    ai
  • DAGWorks Inc.
    DAGWorks Inc. (w2023)Active • 2 employees • San Francisco, CA, USA
    At DAGWorks our mission is to standardize the way people write python code. Hamilton (https://github.com/dagWorks-Inc/hamilton) standardizes the way individuals and teams express data, machine learning, and LLM pipelines. Burr (https://github.com/DAGWorks-Inc/burr/) standardizes how stateful applications are written, executed, and observed. We want our projects to be 'the way' that enables engineers & data practitioners to get their job done: developing quickly, streamlining productionization and maintenance, agnostic to the infrastructure run underneath. By doing so, we can make these initiatives more human capital efficient, enabling businesses to get more ROI out of existing headcount and infrastructure, while also reducing the amount of tools and platforms that businesses need to maintain. We offer hosted observability, catalog, lineage, provenance, and execution features for our projects. Learn more at www.dagworks.io.
    developer-tools
    machine-learning
    b2b
    open-source
    data-engineering
  • Waydev
    Waydev (w2021)Active • 15 employees • San Francisco, CA, USA
    Leverage insights from your engineering stack to accelerate velocity, align engineering work to business priorities, and increase visibility into your team’s DORA Metrics and SPACE Framework Metrics
    b2b
    analytics
    enterprise
    data-engineering
    ai-assistant
  • Pipekit
    Pipekit (s2021)Active • 7 employees • San Francisco, CA, USA
    Our app manages Argo Workflows for data teams, enabling complex data & CI pipelines in half the time while saving companies hundreds of thousands of dollars annually. Argo Workflows is an open-source pipeline framework for Kubernetes that’s used in production by Bloomberg, Intuit, Adobe, New Relic, NVIDIA, and many other open-source early adopters.
    developer-tools
    open-source
    data-engineering
    devops
  • Lariat Data
    Lariat Data (s2021)Active • 3 employees • New York, NY, USA
    Lariat is a Continuous Data Quality monitoring platform to discover data bugs before your consumers do. Ensure data products don’t break even as business logic, input data and infrastructure change. Use Lariat to define and then automatically extract, store and visualize data quality metrics on raw event-level data through to delivered data products.
    machine-learning
    big-data
    data-engineering
  • CustomerOS
    CustomerOS (s2022)Active • 10 employees • London, UK
    The Top 10% of SaaS companies generate 87% of all market returns. CustomerOS gives you the data and tooling to compete with the top 10%. Specifically, we solve three major problems in B2B SaaS today: 1. CustomerOS is a system of record for all your customer data. We support 100+ integrations with any app or database that touches customer data. And there's no engineering required. 2. CustomerOS provides tooling for your in-life customer motion. We predict renewals (and churn), provide risk-weighted ARR forecasts, and manage all your Customer Success workflows, from onboarding to expansion to advocacy. 3. CustomerOS lead scores your pipeline against your ICP. We build data-driven profiles of your best customers and provide a real-time ICP-fit indicator on your sales and marketing pipeline. This ensures you're spending your CAC acquiring customers who are primed to renew year after year and expand as they grow.
    b2b
    customer-success
    open-source
    enterprise
    data-engineering
  • Whaly
    Whaly (s2021)Active • 3 employees • Paris, France
    Whaly helps data teams save time on maintenance and analysis building while making business users more autonomous on the analysis they want to improve their decision making. We do this by providing a self service data platform where both data and business teams can work together. We understood that most data teams were ending up being a bottleneck for the rest of the company and needed to give more autonomy to business teams to back their decisions with data. Emilien, Florian and Pierre were the minds behind the Data advertising platforms of the major media and e-commerce companies in France in their earlier position as Product Manager and head of Customer Success, giving them an edge on how to execute successfully a data project.
    data-engineering
  • Quary
    Quary (w2024)Active • 2 employees • London, UK
    Quary is the first analytics engineering platform that brings the entire model-test-deploy workflow into the browser. At our first customer, a fast-growing fintech company, Quary empowers analysts in the growth team to self-serve, contribute, and reduce reliance on the data engineering team, letting teams ship metrics faster & executives get answers sooner
    artificial-intelligence
    analytics
    data-science
    data-engineering
    ai
  • PeerDB
    PeerDB (s2023)Active • 2 employees
    At PeerDB, we are building a fast, simple and the most cost effective way to stream data from Postgres to Data Warehouses, Queues and Storage engines. If you are running Postgres at the heart of your data-stack and move data at scale from Postgres to any of the above targets, PeerDB can provide value. We support different modes of streaming - log based (CDC), cursor based (timestamp or integer) and XMIN based. Performance wise, we are 10x faster than existing tools. Features wise, we support native Postgres features such as comprehensive set of data-types incl. jsonb/arrays/postgis, efficiently streaming toast columns, schema changes and so on.
    developer-tools
    open-source
    data-engineering
    enterprise-software
    databases
  • Trellis
    Trellis (w2024)Active • 2 employees
    Trellis converts your unstructured data into SQL-compliant tables with a schema you define in natural language. With Trellis, you can now run SQL queries on complex data sources like financial documents, contracts, and emails. Our AI engine guarantees accurate schema and results. Leading enterprises use Trellis to: 1. Unlock hidden revenue in their customer data (e.g., Underwriting teams use Trellis to extract key features from transaction data and build better risk models.) 2. Supercharge RAG applications by enabling end-users to ask analytical questions not possible before with traditional Vector DB (e.g., what are the top three features that users are requesting) 3. Enrich their data warehouse with business-critical information (e.g., Retrieving detailed pricing and quantity information of products sold on competitor websites)
    b2b
    data-engineering
    infrastructure
    ai
    databases
  • InQuery
    InQuery (w2024)Active • 2 employees
    InQuery is a Snowflake alternative built on open source technologies. We make it easy for companies to scale their data workloads, powering analytics, machine learning, and AI use-cases at a fraction of the cost.
    team-collaboration
    big-data
    data-engineering
  • OmniAI
    OmniAI (w2024)Active • 3 employees • New York, NY, USA
    Run ML models across your data. Categorize, extract, summarize, translate, and more. - Connect to a data warehouse - We support Snowflake, Postgres, MySQL, and MongoDB. - Choose models to run - Define type safe schemas to run against your unstructured data. Use any of the hosted models, or define your own. - Transform your data - We’ll run those models against your data, and keep your warehouse in sync as new rows/fields are added/deleted. - Query with SQL - All the transformed data stays in your warehouse. Surface this data in your product, or analyze with your existing BI tools.
    big-data
    data-engineering
    ai
  • Sarus
    Sarus (w2022)Active • 16 employees • Paris, France
    Sarus solves the problem of accessing or sharing personal data for analytics or machine learning. The solution deploys natively in data infrastructures and lets practitioners work on data they cannot see. Every interaction with the sensitive data is protected with the highest privacy standard: differential privacy Sarus makes traditional anonymization methods irrelevant, saving months in compliance and data engineering while preserving all of the value of data.
    analytics
    compliance
    data-engineering
  • Dynamo AI
    Dynamo AI (w2022)Active • 40 employees • San Francisco, CA, USA
    End-to-end privacy, security, and compliance solutions to prepare your organization for emerging AI regulations.
    machine-learning
    privacy
    data-engineering
  • Centauri AI
    Centauri AI (w2024)Active • 3 employees • Alameda, CA, USA
    Centauri AI is a modern ETL and Data Science platform for banks and investment firms, starting with Structured Finance. Financial firms heavily rely on Excel, PDF, and PPT files to exchange complex asset details, leading analysts to spend hours crunching the files and extracting insights. Moreover, these data files and reports cannot be easily reused due to poor data infrastructure. Powered by AI, our product cuts hours of data wrangling work down to minutes and makes it possible to query past data easily. This helps firms evaluate assets faster and win more deals. Since launching last month, we've started a pilot with a brokerage team at a public investment bank that now uses our product daily. Aiming to serve over 100,000 investment teams needing complex data analysis for alternative investments, this opens up a potential $5 billion market.
    fintech
    b2b
    data-science
    data-engineering
    ai
  • Elementary
    Elementary (w2022)Active • 12 employees • Tel Aviv-Yafo, Israel
    Elementary enables data teams to detect problems in their data before their users do. An open-source solution that any data engineer can deploy in minutes without sharing sensitive data.
    developer-tools
    analytics
    open-source
    data-engineering
  • LanceDB
    LanceDB (w2022)Active • 10 employees • San Francisco, CA, USA
    LanceDB is a new open-source vector database that can support low-latency billion-scale vector search on a single node. Built around a new columnar data format, LanceDB makes it incredibly easy to build applications for generative AI, recsys, search engines, content moderation, and more.
    aiops
    machine-learning
    data-engineering
  • Hydra
    Hydra (w2022)Active • 6 employees • San Francisco, CA, USA
    Open source Snowflake alternative. Query billions of rows instantly on column-oriented Postgres. Hydra can be used as open source, managed cloud, or deployable in customer cloud infrastructure. Get parallelized analytics in minutes with no code changes
    developer-tools
    analytics
    open-source
    data-engineering
  • Trackingplan
    Trackingplan (w2022)Active • 8 employees • Barcelona, Spain
    Trackingplan automatically discovers and monitors all the information your applications and websites are collecting, ensuring that you can trust your BI, analytics, marketing, and sales tools. You can think of us as Segment Protocols but totally transparent, where developers can keep using Google Analytics, Amplitude, Hubspot, Intercom, Braze, etc. as they are used to. Installed in minutes in using your Tag Manager or adding just one line of code to your web or apps, we model all the data being sent to third parties. Since Trackingplan understands what each piece of data means, it identifies patterns, detects anomalies, and automatically connects the dots to create value from data that was hidden in plain sight: - An always up-to-date single source of truth and data governance tool. To discover, understand and document your data and improve communication across teams. - Automated notifications when something breaks or changes. To make sure that integrations are always well implemented: Schema errors, traffic anomalies, rogue events... - Easy to understand, customizable, cross-service alerts. To detect trends, insights, and problems without using complex, engineer-oriented solutions.
    saas
    analytics
    data-engineering
  • Bracket
    Bracket (w2022)Active • 3 employees • New York, NY, USA
    Bracket is the two-way data pipeline between popular business tools and backend databases. When ops teams update data in Salesforce or Airtable, and engineers update data in the database, Bracket connects the two sources to reflect the same information.
    saas
    b2b
    data-engineering
  • IvyCheck
    IvyCheck (s2022)Active • 2 employees • Berlin, Germany
    IvyCheck helps you extract hidden insights from your data and ensures high data quality and consistency. Use Generative AI in your data warehouse to transform data at scale.
    generative-ai
    b2b
    data-engineering
    ai
    databases
  • Grai
    Grai (s2022)Active • 3 employees • San Francisco, CA, USA
    Grai is open source version control for metadata. We can determine how database changes will affect deployed machine learning models, apis, and dashboards because we understand how data relates across systems which don’t otherwise talk to each other.
    developer-tools
    saas
    analytics
    open-source
    data-engineering
  • Lamin
    Lamin (s2022)Active • 4 employees • Munich, Germany
    Manage data & analyses with an open-source Python framework. Collaborate across dry and wetlab in a distributed data hub. Get started on your laptop and deploy anywhere.
    developer-tools
    machine-learning
    biotech
    open-source
    data-engineering
  • Findly
    Findly (s2022)Active • 6 employees • London, UK
    Findly.ai is the co-pilot for Business Intelligence that revolutionizes how businesses understand and interact with their data. By creating an engaging chat environment, it empowers decision-makers to gain insights, request reports, and generate visualizations based on their company's metrics. This seamless interaction is made possible by integrating a metric layer that comprehends all your company's metrics. The chat-based exploration simplifies complex data analysis, allowing users to generate comprehensive summaries with a single click, which can be exported to various formats. Furthermore, with the introduction of scheduled chats and action-triggered automations, Findly.ai enhances the autonomy and efficiency of decision-makers. It's more than a tool; it's a decision-making operational system aiming to facilitate decision-makers in achieving their KPIs while spending less time waiting for data.
    generative-ai
    b2b
    chatbot
    data-engineering
    ai
  • Sunpia
    Sunpia (s2022)Active • 3 employees • San Jose, CA, USA
    Sunpia lets developers easily experience the cost and speed benefits of serverless infrastructure, without having to rewrite their code. Developers annotate their code and Sunpia automatically designs a microservice version of it they can deploy on their own cloud.
    developer-tools
    kubernetes
    data-engineering
  • MovingLake
    MovingLake (s2022)Active • 3 employees • Mexico City, CDMX, Mexico
    MovingLake is Fivetran for event-driven architectures. Companies such as Casai use our product to obtain orders and price changes in real time.
    saas
    b2b
    analytics
    api
    data-engineering
  • Yhat (w2015)Acquired • 17 employees • Brooklyn, NY, USA
    Yhat (YC W15, pronounced y-hat) was an end-to-end data science platform. Acquired by Alteryx (NYSE:AYX)
    artificial-intelligence
    machine-learning
    enterprise
    data-engineering
  • Data Mechanics
    Data Mechanics (s2019)Acquired • 25 employees • Paris, France
    Data Mechanics was acquired by NetApp in 2021 and integrated in the Spot.io product portfolio. Our managed Spark-on-Kubernetes platform is live and running under the name Ocean for Apache Spark: https://spot.io/products/ocean-apache-spark/
    saas
    b2b
    open-source
    data-engineering
  • Stackshine
    Stackshine (w2022)Acquired • 7 employees • Portland, OR, USA
    Stackshine is creating mission control for enterprise IT teams. We discover all the software being used across their organization and then automate workflows related to onboarding/offboarding, cost savings, and security.
    robotic-process-automation
    productivity
    analytics
    enterprise
    data-engineering
  • Satsuma
    Satsuma (s2021)Acquired • 5 employees • San Francisco, CA, USA
    Satsuma is a developer tool for building applications on top of real-time blockchain data. Our product lets developers take decoded data from multiple chains, customize it for their use cases, and access it through API endpoints. Blockchains serve as distributed databases for these products, holding their most important data. However, it’s difficult to access and query that data. We believe this friction is an enormous blocker for web3 developers and that better tooling will enable mass adoption for web3. We’re a founding team of engineers, having built data infrastructure and product as early employees at Airtable, Heap, and Y Combinator.
    developer-tools
    saas
    crypto-web3
    data-engineering