Sr Staff Engineer – Quality Engineering Infinia

Not Interested
Bookmark
Report This Job

profile Job Location:

Pune - India

profile Monthly Salary: Not Disclosed
Posted on: 5 hours ago
Vacancies: 1 Vacancy

Job Summary

Overview

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the worlds most demanding AI data centers in industries ranging from life sciences and healthcare to financial services autonomous cars Government academia research and manufacturing.

DDNs A3I solutions are transforming the landscape of AI infrastructure. IDC

The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments - Marc Hamilton VP Solutions Architecture & Engineering NVIDIA

DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads enabling organizations to extract maximum value from their data. With a proven track record of performance reliability and scalability DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.

Our success is driven by our unwavering commitment to innovation customer-centricity and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.

Our commitment to innovation customer success and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.

Job Description

Role Overview

We are seeking a highly skilled and technically strong Senior Staff Quality Engineer to drive the end-to-end quality engineering efforts for Infinia DDNs large-scale distributed data platform.

In this role you will be a senior technical authority responsible for designing implementing and validating complex test infrastructures that ensure the correctness performance and resilience of Infinias distributed architecture. You will work across core subsystemsincluding the I/O path memory management networking stack scheduling layers multi-tenant services and NVMe-backed storage patternsto ensure platform quality at scale.

This is a hands-on high-impact IC role for someone who can solve hard problems automate at scale and elevate quality engineering across the organization.

Key Responsibilities

Quality Engineering & System Validation

  • Design detailed test strategies and validation plans for distributed system components such as task scheduling tracing memory SPDK data path and platform services.
  • Create scalable automated test suites that validate multi-tenant behavior concurrency data consistency and system-level performance.

Automation Frameworks & Tooling

  • Build and maintain robust automation using tools such as Pytest and container-based environments leveraging Docker Jenkins Kubernetes.
  • Develop reusable automation templates harnesses and utilities to accelerate test creation and reduce engineering overhead.

Performance Reliability & Scale Testing

  • Construct and execute performance tests covering I/O throughput system latency NVMe access patterns concurrency limits and long-running workload stability.
  • Use advanced tools (profilers fuzzers failure-injection frameworks trace analyzers) to uncover issues in distributed workflows.
  • Analyze CPU memory disk and network utilization to diagnose performance bottlenecks and identify regression risks.

Cross-Functional Quality Leadership

  • Work closely with architects developers release engineering DevOps and customer engineering to drive quality-first design decisions.
  • Participate in feature design reviews ensuring testability observability and resilience are built into system components.
  • Lead root cause analysis (RCA) for complex issues and propose long-term improvements to engineering practices and platform stability.

Documentation & Quality Standards

  • Produce clear detailed test plans automation guides design-review feedback and quality metrics reports.
  • Contribute to the development and maintenance of internal QA standards best practices and onboarding materials.

Required Qualifications

  • 10 years of experience in software quality engineering with strong focus on distributed systems system-level testing or infrastructure platforms.
  • Hands-on expertise in test automation using Python Bash and modern CI/CD tooling (Git Jenkins etc.).
  • Strong understanding of:
    • Distributed concurrency
    • File systems and I/O stack behavior
    • Storage performance analysis (NVMe SPDK)
    • Networking tracing and system observability
  • Experience with large-scale performance testing stress testing and reliability validation.
  • Demonstrated skill in diagnosing complex system issues across logs traces network captures and profiling tools.
  • ISTQB or equivalent certification preferred.

Preferred Qualifications

  • Experience validating large-scale data platforms storage engines or distributed scheduling systems.
  • Familiarity with observability technologies such as OpenTelemetry Grafana Prometheus.
  • Background in compliance or security testing (e.g. access control backup/restore workflows Section 508/HIPAA/PCI).
  • Contributions to open-source test frameworks or distributed systems validation tools.

Success Metrics First 30 Days

Technical Ramp-Up

  • Develop a deep understanding of Infinias architecture core subsystems and existing quality gaps.
  • Deliver an assessment of current test coverage automation maturity and high-risk areas.

Early Impact

  • Implement or enhance a test automation component for a critical subsystem.
  • Identify 23 performance reliability or test infrastructure improvements and propose actionable plans.

Team Integration

  • Begin partnering with Dev QE Release and SRE teams to integrate quality checks into design and implementation workflows.

Success Metrics Beyond 30 Days

  • Increased automated coverage across core platform areas including reliability performance and concurrency validations.
  • Measurable reduction in escaped defects regressions and late-cycle quality issues.
  • Introduction of new frameworks tools or validation approaches adopted by multiple teams.
  • Recognition across engineering as a go-to technical expert for distributed system quality automation and performance validation.

Join us to deliver the quality backbone of a world-class distributed platformwhere scale correctness and reliability define success.

DDN

DataDirect Networks Inc. is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race color religion gender gender identity gender expression transgender sex stereotyping sexual orientation national origin disability protected Veteran Status or any other characteristic protected by applicable federal state or local law.


Required Experience:

Staff IC

OverviewThis is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the worlds most demanding AI data centers in ind...
View more view more

Key Skills

  • Computer Science
  • Docker
  • Kubernetes
  • Python
  • VMware
  • C/C++
  • Go
  • System Architecture
  • gRPC
  • OS Kernels
  • Perl
  • Distributed Systems

About Company

Company Logo

Revolutionize your AI & HPC ops with DDN® data storage & management solutions. Achieve peak performance, seamless cloud integration & scalable efficiency.

View Profile View Profile