Senior Software Engineer, DGX Cloud Production Engineering

USonsitesenior

Posted today · via Workday

About this role

NVIDIA DGX Cloud is building and operating large-scale GPU infrastructure for AI research and production workloads. We are looking for Senior Software Engineers to help build the automation, tooling, and operational systems that make GPU clusters reliable, scalable, and safe to run. This role is part of a production engineering team focused on Kubernetes-based infrastructure, GPU cluster operations, reliability, automation, GitOps, and Day 2 operability across DGX Cloud environments. What you’ll be doing: Build and operate automation for large-scale GPU clusters across NVIDIA Cloud Partners (NCP) and on-prem environments. Develop tools and services for provisioning, validation, upgrades, monitoring, repair, and cluster lifecycle operations.…

Read the full description on Nvidia's site →

What we'd score you on

reqspace match rubric

Five dimensions, recruiter-grade. Upload your resume and we'll generate a written explanation of where you fit and where the gaps are.

1

Skills match

For this role: python, go, kubernetes, terraform, teams

2

Level fit

This role is senior-level. We check your trajectory against it.

3

Domain experience

Your work in the role's domain matters more than your years total. We weight recent and direct experience.

4

Recency

A skill you used last quarter weighs more than one from five years ago. We grade on recency, not lifetime.

5

Location fit

This role is based in US. We weight your proximity and willingness to relocate.

Score yourself on this role.
Free · no card · written explanation included
See if I'm a fit →

Skills in this role

Pulled from the job description. These are the keywords we'll weight when scoring your fit.

pythongokubernetesterraformteams

More at Nvidia

See all open jobs at Nvidia