AI Inference Performance Engineer - New College Grad 2026

USonsitejunior

Posted today · via Workday

See if I'm a fit →Tailor my resume for this role →Apply on Workday ↗

About this role

We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry’s performance standards across language models, video generation, and speech workloads. We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public accountability. What You Will Be Doing: Drive industry benchmark results: own the end-to-end optimization pipeline, implement and integrate optimizations in quantization, scheduling, memory management, and distributed inference across TensorRT-LLM, SGLang, and vLLM.…

Read the full description on Nvidia's site →

What we'd score you on

reqspace match rubric

Five dimensions, recruiter-grade. Upload your resume and we'll generate a written explanation of where you fit and where the gaps are.

Skills match

For this role: python, c++, k8s, pytorch, openai…

Level fit

This role is junior-level. We check your trajectory against it.

Domain experience

Your work in the role's domain matters more than your years total. We weight recent and direct experience.

Recency

A skill you used last quarter weighs more than one from five years ago. We grade on recency, not lifetime.

Location fit

This role is based in US. We weight your proximity and willingness to relocate.

Score yourself on this role.

Free · no card · written explanation included

See if I'm a fit →

Skills in this role

Pulled from the job description. These are the keywords we'll weight when scoring your fit.

pythonc++k8spytorchopenaiteams

More at Nvidia

See all open jobs at Nvidia →