AIML Performance Engineer

VDart Inc

Not Interested
Bookmark
Report This Job

profile Job Location:

Bellevue, WA - USA

profile Monthly Salary: Not Disclosed
Posted on: 4 hours ago
Vacancies: 1 Vacancy

Job Summary

Role : AIML Performance Engineer

Location: Bellevue WA (Hybrid)

Hire Type: Contract

Job description

  • Design and implement highintensity stress workloads using PyTorch and Triton to identify performance bottlenecks and improve platform stability and maturity
  • Design and implement highintensity stress workloads using PyTorch and Triton Exercise core MAIA execution paths including compute memory DMA and collectives Enable early detection of performance cliffs stability issues and system bottlenecks across simulator and real hardware Improve platform maturity reduce latestage escapes and increase confidence for broader internal and external adoption Develop PyTorch workloads stressing modellevel execution such as large GEMMs attention patterns MoElike behavior mixed precision and longrunning loops Author custom Triton kernels to stress hardware execution units memory hierarchies and synchronization paths Build parameterized stress harnesses scalable by problem size number of devices and runtime duration Integrate workloads with existing profiling monitoring and failure triage tooling Collaborate with platform firmware and SDK teams to target known risk areas and emerging issues Document usage patterns and provide reproducible scripts for lab and continuous integration CI usage

Roles and Responsibilities

  • Develop and maintain a library of reusable PyTorch stress workloads Create Tritonbased micro and macrokernels designed specifically for stress and saturation testing Build and support test harnesses and scripts for singledevice and multidevice execution Ensure workload designs align with platform risk areas and emerging hardwaresoftware issues Collaborate crossfunctionally with platform firmware and SDK teams to refine stress tests Provide comprehensive documentation describing workload intent configuration options and expected stress characteristics Support profiling monitoring and failure triage by integrating stress workloads with existing tools Deliver reproducible and scalable testing solutions for lab and CI environments

Skills
Mandatory Skills : Performance Testing -Analysis (Analysing test Results Server StatsBottlenecks tuning and recommendations) Python Scripting - Shell/PowerShell/Python

Role : AIML Performance Engineer Location: Bellevue WA (Hybrid) Hire Type: Contract Job description Design and implement highintensity stress workloads using PyTorch and Triton to identify performance bottlenecks and improve platform stability and maturity Design and implement highintensity str...
View more view more