Member of Technical Staff, AI Training Infrastructure
via Ashby
About this role
THE ROLE:
As a Training Infrastructure Engineer, you'll design, build, and optimize the infrastructure that powers our large-scale model training operations. Your work will be essential to developing high-performance AI training infrastructure. You'll collaborate with AI researchers and engineers to create robust training pipelines, optimize distributed training workloads, and ensure reliable model development.
KEY RESPONSIBILITIES:
- Design and implement scalable infrastructure for large-scale model training workloads
- Develop and maintain distributed training pipelines for LLMs and multimodal models
- Optimize training performance across multiple GPUs, nodes, and data centers
- Implement monitoring, logging, and debugging tools for training operations…
What we'd score you on
reqspace match rubricFive dimensions, recruiter-grade. Upload your resume and we'll generate a written explanation of where you fit and where the gaps are.
1
Skills match
For this role: aws, azure, gcp, kubernetes, docker…
2
Level fit
We check your title trajectory against the seniority signal of the role.
3
Domain experience
Your work in the role's domain matters more than your years total. We weight recent and direct experience.
4
Recency
A skill you used last quarter weighs more than one from five years ago. We grade on recency, not lifetime.
5
Location fit
This role is based in a specific location. We weight your proximity and willingness to relocate.
Score yourself on this role.
Free · no card · written explanation included
Skills in this role
Pulled from the job description. These are the keywords we'll weight when scoring your fit.
awsazuregcpkubernetesdockerpytorch
