Data Warehouse for Computer Vision

Open Source Software Engineering Intern

$80K - $100K
San Francisco, CA, US
Job Type
Connect directly with founders of the best YC-funded startups.
Apply to role ›
Sammy Sidhu
Sammy Sidhu

About the role

About Eventual:

At Eventual (YC W22), we are building Daft, the open-source framework for processing complex data.

Data processing systems today are highly optimized for simple tabular data. However, much of the useful data in the world is in a more complex form, such as media (e.g. images, video, audio), scientific formats (e.g. genomes), and ML artifacts (e.g. embeddings). There are many challenges today that make processing complex data much more difficult than working with simple tabular data. Our goal is to make working with complex data as easy as working with simple data, and become the de-facto solution for building applications on top of complex data.

You would be joining a small, fast-moving team of engineers with deep expertise across related domains: big data, distributed systems, machine learning, self-driving, genomics, and high performance computing.

About the role:

As a software engineering intern, your primary responsibilities would be contributing to the development of the open-source Daft distributed dataframe. Development moves quickly here, but here are some projects you might expect to work on:

  1. Implementing algorithms on complex data types, such as embedding similarity or image kernels
  2. Improving performance of distributed join operations
  3. Building integrations for high-performance data loading

We are a young startup, so do be prepared to wear many hats - tinkering with infrastructure, talking to users, and participating in design processes with the team!

About you:

You will want to be comfortable working on a novel distributed data processing system. Some things that will be important are:

  1. Industry or research experience working with distributed systems, especially data-intensive systems such as Spark, Dask, or Ray.
  2. Experience with Arrow-based frameworks.
  3. Familiarity with Python or Rust.
  4. A strong sense of ownership and autonomy; a desire to build good systems for users.

Big nice to haves are:

  1. Experience working on production machine learning systems.
  2. Experience with compilers or query optimization.

Office and benefits:

Our office setup is hybrid remote, with three in-person days a week. We are in our San Francisco office Wednesday to Friday.

About the interview

15-minute phone screen

A short phone screen over video call with one of our cofounders (either Sammy or Jay) for us to get acquainted, understand your aspirations and evaluate if there is a good fit in terms of the type of role you are looking for.

Technical Interviews

Our technical interviews for this role are focused on understanding your technical knowledge with distributed data processing.

60-minute data engineering design interview

A technical interview to understand your familiarity with the internals of a distributed data engineering system.

60-minute systems programming interview

A technical interview to understand your familiarity with systems programming and Linux.

Get to know us

As many chats as necessary to get to know us - come have a coffee with our cofounders and existing employees to understand who we are and our goals, motivations and ambitions.

We look forward to meeting you!

About Eventual

Eventual: The Data Warehouse for Computer Vision

Eventual is building an integrated development experience for data scientists and engineers to query, process and build applications on Complex Data (non-tabular data such as images, video, audio and 3D scans).


Daft (https://www.getdaft.io) is our open-sourced Python dataframe API for working with Complex Data. With Daft, users can query and transform their data interactively in a notebook environment, running workloads such as analytics, data preprocessing and machine learning model training/inference. The same transformations that are performed on the dataframe can then be deployed as a HTTP service to respond to incoming requests, helping our users go from experimentation to productionization faster than ever before.

Eventual Cloud Platform

The Eventual Cloud Platform provides an integrated development environment for our users to go from local development to production. We provide:

Notebooks for interactive data science with Daft Fully-managed cluster computing infrastructure to run large distributed Daft workloads Application deployment as services or automated jobs

About Us

Eventual (YC W22) is funded by investors such as Caffeinated Capital, Array.vc and top angels in the valley from Databricks, Meta and Lyft. Our team has deep expertise in high performance computing, big data technologies, cloud infrastructure and machine learning.

Team Size:5
Location:San Francisco
Jay Chia
Jay Chia
Sammy Sidhu
Sammy Sidhu