++++
Please apply to this role on our website
https://www.atla-ai.com/jobs/machine-learning-engineer
++++
As Atla’s Machine Learning Engineer, you’ll spearhead our post-training and inference frameworks for large language models.
- Design and optimise a scalable post-training framework to efficiently manage training runs while controlling costs.
- Implement scalable data pipelines, optimise models for performance and accuracy, and ensure they are production-ready.
- Engineer robust, high-performance inference platforms to ensure our products' reliability, throughput and speed at scale.
- Collaborate with researchers and engineers to accelerate the iteration of new research ideas and improve training workflows.
- Build and grow our engineering organisation, setting a high bar of excellence that propels Atla forward.
Please note that this role is in-person (we can sponsor visas and offer international relocation support as a UK AI Futures partner)
Qualifications
Evidence of exceptional ML engineering ability:
- Proven expertise in software or ML engineering, focusing on building robust, scalable systems.
- Experience with orchestration systems like SLURM or Ray, along with MLOps tools such as Kubernetes, Vertex, or Sagemaker.
- Skilled in creating and managing multi-instance clusters for data and model parallel training on GPUs/TPUs, preferably using DeepSpeed or PyTorch FSDP.
- Proficient in serving large machine learning models at scale, including quantization, distributed computing, and using frameworks like vLLM or Ray Serve.
- Strong understanding of techniques like paged attention, gradient checkpointing, and DeepSpeed, with the ability to implement and optimise them at scale.
Nice to have
- Experience at a leading AI company (Mistral, Anthropic, OpenAI, X.ai, HuggingFace, Cohere, Stability, etc.)
- Interested in and thoughtful about of the impacts of AI technology.
About you
You are going to thrive at Atla with the following mindset:
- Collaborative and team-oriented, with strong communication skills.
- Comfortable with the uncertainty and fast pace of a hyper-growth startup.
- Willingness to continuously learn and adapt in a dynamic environment.
- Unpretentious and hard working; find the best ideas wherever they come from.
Compensation
- Exceptionally competitive salary
- Significant stake in equity as one of the first joiners
- Pension plan
- Medical, dental, and vision benefits
Join our driven team to make a dent in the universe by engineering safe, beneficial AI systems!
Atla is an AI research and deployment company dedicated to enabling the safe development of artificial general intelligence. Generative AI can only reach its full potential when it consistently produces safe and useful results. We train models to catch errors, monitor AI performance, and understand critical failure modes.
We’re a team of researchers, engineers, entrepreneurs and operational leaders, with experience spanning a variety of disciplines, all working together to enable the development of safe artificial general intelligence.
We are backed by Y Combinator, Creandum, and the founders of Reddit, Cruise, Rappi, Instacart and more.