Software that streams data from databases to warehouses in real-time

Artie is software that streams data from databases to data warehouses in real-time. Today, most companies run their ETL process every few hours or overnight, so their data warehouse is always out of date; with Artie, the warehouse always has live production data.

Jobs at Artie

San Francisco, CA, US
$125K - $175K
3+ years
Team Size:4
Location:San Francisco
Group Partner:Jared Friedman

Active Founders

Jacqueline Cheong

Co-founder and CEO at Artie. Jacqueline was previously a software investor and was responsible for a ~$300M software book within a larger TMT portfolio. Now she's focused on making real time data streaming easy and accessible.

Jacqueline Cheong
Jacqueline Cheong

Robin Tang

CTO @ Artie. Software engineer with a background in distributed systems and designing for high throughput and low latency. Currently working on providing real-time data replication by leveraging data streams and CDC. Prev Consumer and Growth @ Opendoor, built Sunshine CRM @ Zendesk and lead engineer @ Outbound (YC W15).

Robin Tang
Robin Tang

Company Launches


Artie is a real-time database replication solution. We leverage change data capture (CDC) and stream processing to perform data syncs in a more efficient way, which enables sub-minute latency and helps optimize compute costs. Today, we’re launching our Analytics Portal to provide visibility into our streaming pipelines and offer production-level monitoring for related system infrastructure out-of-the-box.

With the Analytics Portal, we hope to help alleviate some of the challenges that data teams face when running their data stack. By offering real-time observability into database pipelines and peripheral infrastructure, we hope companies can increase their understanding of how systems impact one another, reduce downtime/debug issues faster, and generate proactive alerts to maintain robust infrastructure.

Visibility related to database replication is lacking

The core of database replication is transferring data in a timely, accurate, and reliable manner. This is table stakes. In addition, there is a lot more happening in the peripheral, such as database monitoring, data pipeline visibility, data latency monitoring, and others. Data engineers need visibility to answer questions like:

  • How many rows have been synced in the past hour? How does that compare to the average/median/last month?
  • What is the data latency for our organization’s three most important tables? What are the factors that impact data latency and how are those metrics trending?
  • How are my systems performing?
    • Are there any database permission errors that may cause my data pipeline to go down?
    • Are my databases sized correctly? How is CPU, memory, and storage utilization?
    • Is my replication slot growing? By how much?

Setting up these metrics and monitors is important to help with debugging and maintaining a robust database replication solution. However, this requires expertise and domain knowledge that may not be accessible at every company. There is also no standardization of which metrics to track and what benchmarks to follow. To make matters worse, when it comes to adopting cloud solutions, database/pipeline visibility is severely limited. When pipelines break down, customers are often left in the dark, not knowing what broke, why it broke, and how to fix it.

Artie’s Analytics Portal unlocks visibility & reduces MTTD

We are extremely excited to announce our Analytics Portal to increase the visibility and observability of our streaming pipelines. This will provide insights into system-level infrastructure and help with monitoring database and pipeline health. When identifying and resolving issues, one of the most important metrics is to reduce MTTD (mean time to detection). With Artie’s streaming pipelines and periodic jobs like Postgres Watcher, metrics are being sent to our Analytics Portal in flight, as the underlying data is still being processed.

With the first iteration of our Analytics Portal, we are providing industry-standard telemetry to streaming pipelines and related infrastructure. Data teams will be able to observe the following:

  • Data ingestion latency by table and database/deployment
  • Operation distribution by table and database/deployment
  • Rows synced by table and database/deployment
  • Database monitors that are related to and can impact pipeline performance:
    • Permission errors that may interfere with data replication
    • Replication slot size
    • Free disk space*
    • CPU utilization*
    • Memory and Disk I/O (input/output)*
    • Average transaction time*
    • Existence of long-running queries*

*coming soon

Production level monitoring out-of-the-box

The Analytics Portal initially comes with a set of pre-built charts and monitors. Customers are able to drill down to get deployment, database, and table-level statistics.

The pre-built monitors that we are launching with include alerts for database permission errors and replication slot growth (for Postgres users). Over time, we will add alerting for the other monitors we mentioned above and more. This enables customers to have production-level monitoring set up out-of-the-box for their business-critical data.

For example, an e-commerce company might be watching its online transactions table closely during the holidays. Let’s say they observe data ingestion latency going up for online transactions. They zoom out and realize it’s not just the online transactions table that is experiencing higher latency, but all tables under their Postgres connector are impacted and very few rows have been synced in the past 5 minutes. To troubleshoot this, they look over to the database monitors and realize their database’s replication slot has been growing and the culprit is a long-running query that has locked the table and is preventing Postgres from advancing the replication slot. After a quick Slack message to their internal DevOps team, the query is killed and the issue is resolved.

Making the Analytics Portal more actionable & customizable over time

Over time, we will make the data more actionable and customizable. In the near future, we plan to enable row-based monitoring such that customers can monitor custom business logic. In addition to the pre-built charts and monitors that we provide out-of-the-box, we want to allow customers to create custom charts and configure views based on metrics that matter to their business.

For example, say you are working at a Fintech company that wants to monitor live transactions to detect fraud and abuse on your platform. You have a transactions table and this table is being synced to your Snowflake instance. You should be able to generate a chart to plot the average, median, p95, and max transaction sizes across various lookback periods (30 minutes, 1 hour, 24 hours, 7 days, etc). Then you can set up business logic monitors such as:

  • Flag transactions that are 1.5 standard deviations above average coming from a merchant who signed up on the platform less than one week ago
  • Flag key accounts where transaction volume has increased above or fallen below a certain threshold in the past hour

Depending on how you’d like to be notified, Artie plans to support the following escalation channels:

  • Email
  • Slack
  • Webhooks

In this example, the escalation channel is a webhook to your API server so you can then run a more rigorous machine-learning fraud model against a particular merchant account or transaction.

Contact us at hi@artie.so to learn more!

Other Company Launches

Artie - real-time data streaming for databases ⚡

We transfer data from databases to data warehouses in real-time with CDC streaming
Read Launch ›

Selected answers from Artie's original YC application for the S23 Batch

What is your company going to make? Please describe your product and what it does or will do.

Artie Transfer is a service that provides real-time data replication from transactional databases to data warehouses. Artie Transfer’s architecture leverages change data capture to stream data changes continuously into your data warehouse. When dealing with data, speed is nothing without accuracy - as such, Artie Transfer comes with all the features you’d expect like automatic retries, idempotency, automatic schema evolution support, telemetry, error reporting and is horizontally scalable.