Senior Data Engineer
About the role
Our mission is to make biology easier to engineer. Ginkgo is constructing, editing, and redesigning the living world in order to answer the globe’s growing challenges in health, energy, food, materials, and more. Our bioengineers make use of an in-house automated foundry for designing and building new organisms. Today, our foundry is developing over 40 different organisms to make different products across multiple industries.
We're creating the codebase, compiler, and debugger for biology. We have built a strong set of internal software tools, automation, and processes that enable high-throughput genetic engineering across multiple species. We want to make them better, more powerful, more scalable, and more effective, while making them easier to use, manage, and deploy.
As a Senior Data Engineer, you’ll join in architecting our platform to support analytics and machine learning that will ultimately help to define how our bioengineering is performed at scale. Ginkgo's programming languages of choice are Python and SQL, and DNA, but you are someone who loves writing elegant code in any language. Plus, you're an experienced data wrangler who enjoys building systems from the ground up. Most importantly, you will be passionate about making biology the next engineering discipline.
Note: The current list of tools we utilize includes RDS Postgres, Snowflake, Airflow, AWS DMS, Spark on EMR, and Python. Extensive experience with the tools we use is not required, but rather a working understanding of the Desired Software and Tools listed below is preferred.
Desired Software and Tools Working Knowledge
- Data pipeline and workflow management tools: Airflow, Luigi, etc.
- Big Data tools: Hadoop, Hive, Spark
- AWS cloud services: EC2, EMR, RDS, Redshift, S3
- Languages: Python, Java, Scala, etc.
- Expanding and optimizing our data pipeline architecture, as well as flow and collection for cross functional teams. This includes: automating manual processes, ETL, re-designing infrastructure for greater scalability, and improving reliability and accuracy
- Supporting our software engineering initiatives to ensure optimal delivery architecture is consistent throughout ongoing projects
- Using appropriate tools to analyze the data pipeline and provide actionable insights into operational efficiency, data accuracy, and other KPI’s
- Working with various stakeholders to assist with related technical issues and infrastructure needs
- Keeping our data secure
Desired Experience and Capabilities
- BS, MS, or PhD in computer science or related quantitative field
- 5+ years of data engineering experience, with advanced knowledge of database design best practices
- Experience working with relational databases, data warehouses, and big data platforms
- Demonstrated ability performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement
- Strong analytical skills in relation to working with large datasets
- Experience building processes that support data transformation, data structures, metadata, dependency, and workload management
- Working knowledge of message queuing, stream processing, and highly scalable big data stores
- Analytical, highly motivated self-starter, with strong project management and organizational skills
Why you should join Ginkgo Bioworks
Ginkgo Bioworks is the organism company. We design custom organisms for customers across multiple markets. We build our foundries to scale the process of organism engineering using software and hardware automation. Organism engineers at Ginkgo learn from nature to develop new organisms that replace technology with biology.
Engineering biology isn't easy. It is frustratingly, painfully difficult. It's programming without a debugger, manufacturing without CAD, and construction without cranes. At Ginkgo we are building a team that can build debuggers, write CAD, and operate cranes. We are looking for the best engineers, scientists, and hackers.