Toshiba Global Commerce Solutions is seeking aDirector of Production Engineeringto own theengineering systems that make reliability performance correctness and release safety predictableacross our global POS edge cloud and middleware platform.
This is aproduction engineering leadership rolefocused ondistributed system correctness resilience and performance engineering. You will partner closely with existing SRE and Operations leaders but your charter is toengineer preventionby building production readiness standards automated release gates and performance/resilience validation mechanisms that stop unsafe changes before they ship.
As AI accelerates development velocity the bottleneck shifts from writing code toverifying correctness performance and safe behavior under failure. This role builds the production engineering foundation that allows teams to move fastwithout degrading latency throughput or availability.
What Success Looks Like
- Releases ship fasterwithout increasing Sev-1 / Sev-2 incidents
- Incident recurrence drops measurably due to enforced learning and prevention
- Edge store cloud workflows behave safely under real failure conditions
- Reliability isengineered automated and enforced not reactive
- Teams clearly understand what safe to release means and pipelines enforce it
Responsibilities
Production Engineering & Release Safety
- Ownnon-functional release criteriaandautomated release gatesfor reliability resilience performance and correctness across complex release trains.
- Define and enforceProduction Readiness Reviews (PRRs)and platform-wide engineering standards.
- Establishobjective measurable safe-to-release signals consumed by CI/CD and release tooling.
Distributed Systems Correctness (Edge Cloud Commerce)
- Partner with Architects and Principal Engineers to definefailure modes degradation behavior and system guardrailsfor distributed and eventually consistent workflows.
- Ensure systems behave correctly during retries partial outages intermittent connectivity degraded modes and recovery.
- Lead engineering initiatives that reduce risk ofdata loss duplication corruption or inconsistent stateacross POS middleware and cloud services.
Incident Learning That Prevents Recurrence
- Leadblameless incident reviews using formal analysis methods.
- Ensure corrective actions are engineered into systemsvalidated tracked and audited.
- Institutionalize learning so failures do not reappear under new conditions or scale.
Resilience & Performance Engineering
- Own platform-level strategies for resilience performance and scalability validation.
- Drive chaos failover load stress and soak testing focused onreal failure modes not synthetic demos.
- Validate store-mode behavior payment workflows edge-device dependencies and multi-service interactions.
Observability & Reliability Signals
- Ensure high-fidelity telemetry (logs metrics traces and business signals) thatsupportsrelease gating correctness verification and diagnosis.
- Drive instrumentation standards that allow teams to prove reliability outcomes with data.
Cross-Org Technical Leadership
- Partner with Software Engineering Architecture Quality Engineering Cloud Operations and TPM/TPO teams.
- Build and lead senior technical managers and staff-level engineers.
- Set expectations for technical depth ownership andexecutionquality.
Required Experience
- Bachelors degree in Computer Science Engineering or equivalent practical experience.
- 1015 years building production engineering capabilities for distributed software platforms with direct accountability for production outcomes.
- Demonstrated experiencedefining and enforcing production readiness standardsandnon-functional release gatesthat prevent unsafe changes from shipping.
- Proven ability to leadformal root-cause / reliability analysisand ensure systemic fixes reduce recurrence.
- Strong distributed systems fundamentals including the ability to reason about:
- Failure modes and degradation behavior
- Dependency risk retries and backpressure
- Consistency tradeoffs and correctness under failure
- Experience partnering deeply with Architecture and Software Engineering to embed reliability guardrails intodesign reviews CI/CD pipelines and system standards.
- Senior leadership experience building teams and influencing across large engineering organizations.
Preferred Experience
- Retail POS payments edge devices or store environments.
- Designing reliability automation (release scoring regression detection incident pattern analysis).
- Hybrid cloud edge architectures; Kubernetes/AKS; modern observability platforms.
- Leading reliability transformations in large complex engineering organizations.
Why This Role Matters
As AI acceleratesdevelopmentvelocity reliability and verification become the limiting factor. This role ensures:
- Uptime is engineered not reactive
- Development and QAoperateat AI-enabled speed
- The platform scales safely without sacrificing correctness
- TGCS matches or exceeds best-in-class engineering organizations
You will build theproduction engineering foundationthat powers the next decade of global commerce innovation.
Toshiba Global Commerce Solutions is a dynamic billion-dollar global company based in Research Triangle Park NC providing retail store solutions to your favorite brands. Have you ever been in a hurry and made use of the self-checkout at Lowes Foods earned fuel rewards at Kroger or just paid for purchases at retailers such as Walmart Michaels Carrefour The Gap Calvin Klein Boots Cencosud BJs or Costco These are just a few examples of our in-store solutions and impressive customer base that made us the worlds installed market share leader.
The nature of retail is changing quickly so if you share our Together Commerce vision of a seamless two-way participatory shopping experienceletsget together to drive the new economy.
Toshiba Global Commerce Solutions Inc. offers a competitive salary and generous benefits package including the following:
- Group health coverage (medical dental & vision)
- Employee Assistance Programs
- Pre-tax spending accounts
- 401(k) plan (with company match)
- Company provided life insurance
- Generous paid holiday schedule paid vacation & sick/personal days
EEO:
Toshiba Global Commerce Solutions is an equal opportunity/affirmative action employer that evaluates qualified applicants without regard to age ancestry color religious creed disability marital status medical condition genetic information military or veteran status national origin race sex gender gender identity gender expression and sexual orientation or any other protected factor. We also consider qualified applicants regardless of criminal histories consistent with legal requirements.
Individuals who need a reasonable accommodation because of a disability for any part of the employment process should emailto request an accommodation
DIVERSITY EQUITY & INCLUSION:
We at Toshiba Global Commerce Solutions firmly believe that our people are an integral part to the success of our customers. Furthermore were committed to Diversity Equity and Inclusion for all our people as highlighted by our 5 Core Principles (Create Outreach Foster Belonging Unleash Opportunity Diverse Cultural Engagement and Culture of Transparency). Were passionate about ourcustomersthe retail industry andbecominga more responsible company as we help create a brighter future.