Pachyderm: Data Versioning, Data Pipelines, and Data Lineage

Developer Productivity Engineer at Pachyderm

San Francisco Bay Area OR Remote / Remote
Job Type
3+ years
Connect directly with founders of the best YC-funded startups.
Apply to role ›
Joe Doliner
Joe Doliner

About the role

At Pachyderm, we're building an open-source enterprise-grade data science platform that lets you deploy and manage multi-stage, language-agnostic data pipelines while maintaining complete reproducibility and provenance. Our system, developed with open source roots, shifts the paradigm of data science workflows by providing reproducibility, data provenance, and opportunity for true collaboration. Pachyderm utilizes modern technologies like Docker and Kubernetes to build an entirely new method of analyzing data. Offered both as an in-house solution as well as hosted-service, Pachyderm brings together version-control for data with the tools to build scalable end-to-end ML/AI pipelines while empowering users to use any language, framework, or tool they want. If you want to learn more about our grand vision, read what has become our "manifesto."

Pachyderm is a rapidly growing, early-stage company funded by the top VC's — Benchmark, Decibel, M12, and YCombinator. Like many modern companies, Pachyderm embraces a "Remote-first" approach to growing our team. It gives us a huge advantage in hiring top talent and diverse talent across the country while giving our team members the flexibility to work from anywhere.

You can check out our product on GitHub because it's open-source and try our cloud service for free.

The Role

Love Docker, Golang, and distributed systems?

Pachyderm is hiring an Automation / Sr. Automation Engineer / Developer Productivity Engineer to help us architect and build out the framework for testing the core product -- a distributed version-controlled filesystem and data processing engine. You'll be working on challenging distributed systems problems every day and helping us build a first-of-its-kind, containerized, data infrastructure platform.

While your primary focus will be to work on building the automation infrastructure, tools, and framework for testing the core product, you'll also own the charter for performance benchmarks, long-running deployments, CI, and releases. In addition, you will help layout sustainable practices within engineering to continually raise the quality bar. At Pachyderm, OSS user and customer feedback is a major driver of our product roadmap and we believe that everyone within the company should experience that first-hand. In this role, you will be the customer voice within engineering. This is expected to be a development/automation focused role and some amount of manual testing. You will have a outsized impact in making your stakeholders (developers, customer team, and ultimately customers) successful and happy.

In this role, you will use Docker, Kubernetes, GoLang, Python, CI systems, various cloud providers, and more.

Pachyderm is just a small team right now, so you'd be getting in right at the ground floor and have an enormous impact on the success and direction of the company and product.

We offer significant equity, full benefits, and all the usual startup perks.

You will: Design, develop, execute, and maintain an automated testing framework, tools, and infrastructure Test the product for performance, resiliency, security, scalability, and reliability Understand the end-to-end configuration, technical dependencies, code paths, and overall behavioral characteristics of the platform Own the performance and longevity benchmarks Analyze and understand existing test coverage and test cases, identifying opportunities for redesign, replacement, reusability, and improvement in efficiency and performance Define and inspire changes to our product with our development engineering team based on feedback from tests and customer issues Develop and contribute to internal and external knowledge bases Care about developer happiness and be a champion for our customers Go above and beyond to ensure customers are getting the most out of their investment in the Pachyderm platform

Qualifications: Experience working in a continuous integration / continuous delivery development environment Experience working with Kubernetes, Docker automation Experience with automation in distributed systems Strong programming skills and experience (Go, Java, Python, C++) Must have strong communication skills when talking about technical concepts. Professional experience in Databases and/or Distributed Systems BS in CS (or equivalent technical degree) and 5+ years of relevant work experience (QA/Automation/Development)

Why you should join Pachyderm