Data Warehouse for Computer Vision
At Eventual (YC W22), we are building Daft, the open-source framework for processing complex data.
Data processing systems today are highly optimized for simple tabular data. However, much of the useful data in the world is in a more complex form, such as media (e.g. images, video, audio), scientific formats (e.g. genomes), and ML artifacts (e.g. embeddings). There are many challenges today that make processing complex data much more difficult than working with simple tabular data. Our goal is to make working with complex data as easy as working with simple data, and become the de-facto solution for building applications on top of complex data.
You would be joining a small, fast-moving team of engineers with deep expertise across related domains: big data, distributed systems, machine learning, self-driving, genomics, and high performance computing.
As a software engineering intern, your primary responsibilities would be contributing to the development of the open-source Daft distributed dataframe. Development moves quickly here, but here are some projects you might expect to work on:
We are a young startup, so do be prepared to wear many hats - tinkering with infrastructure, talking to users, and participating in design processes with the team!
You will want to be comfortable working on a novel distributed data processing system. Some things that will be important are:
Big nice to haves are:
Our office setup is hybrid remote, with three in-person days a week. We are in our San Francisco office Wednesday to Friday.
A short phone screen over video call with one of our cofounders (either Sammy or Jay) for us to get acquainted, understand your aspirations and evaluate if there is a good fit in terms of the type of role you are looking for.
Our technical interviews for this role are focused on understanding your technical knowledge with distributed data processing.
A technical interview to understand your familiarity with the internals of a distributed data engineering system.
A technical interview to understand your familiarity with systems programming and Linux.
As many chats as necessary to get to know us - come have a coffee with our cofounders and existing employees to understand who we are and our goals, motivations and ambitions.
We look forward to meeting you!
Eventual is building an integrated development experience for data scientists and engineers to query, process and build applications on Complex Data (non-tabular data such as images, video, audio and 3D scans).
Daft (https://www.getdaft.io) is our open-sourced Python dataframe API for working with Complex Data. With Daft, users can query and transform their data interactively in a notebook environment, running workloads such as analytics, data preprocessing and machine learning model training/inference. The same transformations that are performed on the dataframe can then be deployed as a HTTP service to respond to incoming requests, helping our users go from experimentation to productionization faster than ever before.
The Eventual Cloud Platform provides an integrated development environment for our users to go from local development to production. We provide:
Notebooks for interactive data science with Daft Fully-managed cluster computing infrastructure to run large distributed Daft workloads Application deployment as services or automated jobs
Eventual (YC W22) is funded by investors such as Caffeinated Capital, Array.vc and top angels in the valley from Databricks, Meta and Lyft. Our team has deep expertise in high performance computing, big data technologies, cloud infrastructure and machine learning.