This engineer is expected to lead by example through hands-on contributions deep technical expertise and cross-team influence particularly in the area of infrastructure bootstrap orchestration and automation at scale.
Key Responsibilities:
Platform Ownership & Reliability:
Own the end-to-end lifecycle (design provisioning upgrades and decommissioning) of core platform components including:
- Cloud infrastructure primitives
- Kubernetes clusters and cluster services
- Networking ingress and service discovery
- Service Mesh and supporting data-plane components
Ensure platform components are resilient by design applying SRE principles such as:
- Fault isolation and graceful degradation
- Capacity planning and saturation control
- Reduced operational toil and clear failure modes
- Continuously assess and mitigate reliability risks proactively improving platform stability and operational readiness.
Infrastructure Bootstrap & Automation Leadership:
Lead the design and implementation of infrastructure bootstrap orchestration including:
- Automated cluster and environment provisioning
- Deterministic repeatable platform bring-up and teardown
- Dependency-aware orchestration across cloud network and Kubernetes layers
Drive a strong Infrastructure-as-Code and GitOps-first approach ensuring:
- Platform components are reproducible and auditable
- Changes are automated testable and reversible
- Manual intervention is minimized or eliminated
- Identify automation gaps and lead initiatives that significantly reduce human effort onboarding time and operational risk.
SRE Practices & Operational Excellence:
Apply and promote SRE practices across the platform including:
- Clear ownership and runbooks for platform components
- Participation in on-call rotation as a platform reliability escalation point
- Incident response post-incident reviews and problem management
Improve platform operability by:
- Simplifying day-2 operations
- Standardizing upgrade and rollback strategies
- Reducing Mean Time to Detect (MTTD) and Mean Time to Recover (MTTR)
- Ensure platform operations align with security compliance and internal control requirements.
This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager.
Qualifications :
Strong hands-on experience with:
- Public Cloud platforms (AWS preferred Azure)
- Kubernetes at scale previous experience administrating productive Kubernetes environments
- Service Mesh technologies (e.g. Istio preferred App Mesh Linkerd)
Strong understanding of:
- Observability tooling and Golden Signals concepts
- Incident management concepts and oncall operations
- Infrastructure as Code (e.g. Terraform)
- Cloud-Native containerized micro-services architecture
- Strong collaboration and communication skills.
Additional Information :
Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.
Remote Work :
No
Employment Type :
Full-time
This engineer is expected to lead by example through hands-on contributions deep technical expertise and cross-team influence particularly in the area of infrastructure bootstrap orchestration and automation at scale.Key Responsibilities:Platform Ownership & Reliability:Own the end-to-end lifecycle ...
This engineer is expected to lead by example through hands-on contributions deep technical expertise and cross-team influence particularly in the area of infrastructure bootstrap orchestration and automation at scale.
Key Responsibilities:
Platform Ownership & Reliability:
Own the end-to-end lifecycle (design provisioning upgrades and decommissioning) of core platform components including:
- Cloud infrastructure primitives
- Kubernetes clusters and cluster services
- Networking ingress and service discovery
- Service Mesh and supporting data-plane components
Ensure platform components are resilient by design applying SRE principles such as:
- Fault isolation and graceful degradation
- Capacity planning and saturation control
- Reduced operational toil and clear failure modes
- Continuously assess and mitigate reliability risks proactively improving platform stability and operational readiness.
Infrastructure Bootstrap & Automation Leadership:
Lead the design and implementation of infrastructure bootstrap orchestration including:
- Automated cluster and environment provisioning
- Deterministic repeatable platform bring-up and teardown
- Dependency-aware orchestration across cloud network and Kubernetes layers
Drive a strong Infrastructure-as-Code and GitOps-first approach ensuring:
- Platform components are reproducible and auditable
- Changes are automated testable and reversible
- Manual intervention is minimized or eliminated
- Identify automation gaps and lead initiatives that significantly reduce human effort onboarding time and operational risk.
SRE Practices & Operational Excellence:
Apply and promote SRE practices across the platform including:
- Clear ownership and runbooks for platform components
- Participation in on-call rotation as a platform reliability escalation point
- Incident response post-incident reviews and problem management
Improve platform operability by:
- Simplifying day-2 operations
- Standardizing upgrade and rollback strategies
- Reducing Mean Time to Detect (MTTD) and Mean Time to Recover (MTTR)
- Ensure platform operations align with security compliance and internal control requirements.
This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager.
Qualifications :
Strong hands-on experience with:
- Public Cloud platforms (AWS preferred Azure)
- Kubernetes at scale previous experience administrating productive Kubernetes environments
- Service Mesh technologies (e.g. Istio preferred App Mesh Linkerd)
Strong understanding of:
- Observability tooling and Golden Signals concepts
- Incident management concepts and oncall operations
- Infrastructure as Code (e.g. Terraform)
- Cloud-Native containerized micro-services architecture
- Strong collaboration and communication skills.
Additional Information :
Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.
Remote Work :
No
Employment Type :
Full-time
View more
View less