Cloud SRE Architect
Job Summary
Site Reliability Engineer (SRE) Architect SRE Team
SRE Team
We are responsible for ensuring our platform remains stable scalable and resilient. The SRE team bridges the gap between development and operations breaking down silos empowering developers and fostering a culture of ownership and continuous improvement.
At the architect level this role shapes the reliability strategy platform standards and engineering practices that enable our systems to scale safely and efficiently across the enterprise.
We build creative automated and robust solutions to operational challenges partnering with product platform and engineering teams from early design through to production optimization.
We see the big picture defining standards enabling consistency and cultivating an agile learning-oriented culture. We follow SRE principles such as blameless postmortems error budgets and continuous feedback loops to ensure both system reliability and team sustainability.
Above all we are passionate about automation observability and continuous improvement operating at scale where reliability is a product feature.
As an SRE Architect you will:
- Define and drive the enterprise reliability strategy standards and reference architectures.
- Architect and evolve highly available scalable platforms across AWS and container ecosystems.
- Lead the design and governance of SLI/SLO frameworks error budgets and reliability KPIs across services.
- Provide technical leadership for container platforms (ECS Fargate Kubernetes) and cloud-native workloads.
- Establish and mature incident management practices including major incident response post-incident reviews and operational readiness.
- Design and standardize observability architecture (metrics logs traces RUM synthetic monitoring).
- Partner with security teams to implement least-privilege IAM models secure data patterns and cloud guardrails.
- Drive scalability and performance engineering initiatives across critical services.
- Guide teams on resilience patterns (multi-AZ multi-region graceful degradation circuit breakers).
- Influence platform roadmaps and mentor engineers across SRE platform and product teams.
- Champion automation-first thinking across infrastructure provisioning deployments and operations.
- Act as a technical escalation point for complex production incidents and systemic reliability risks.
In short design the systems that keep everything running at scale.
Heres What You Need:
- 15 years of relevant experience in Site Reliability Engineering Platform Engineering or Cloud Infrastructure roles.
- Deep hands-on expertise with AWS including core services such as:
- ECS / Fargate
- EKS / Kubernetes
- IAM (advanced policy design and guardrails)
- S3 (security lifecycle and large-scale data patterns)
- EC2 Auto Scaling ALB/NLB VPC
- Proven experience designing and operating large-scale container platforms (ECS and/or Kubernetes).
- Strong experience implementing SLI/SLO frameworks error budgets and reliability governance.
- Demonstrated leadership in incident management and major incident response.
- Deep understanding of observability ecosystems (Datadog Dynatrace Prometheus/Grafana Splunk or similar).
- Strong Linux and cloud networking fundamentals.
- Experience with infrastructure as code (Terraform CloudFormation or equivalent).
- Proficiency in automation and scripting (Python Bash or similar).
- Experience driving scalability performance tuning and capacity planning initiatives.
- Strong stakeholder management and cross-functional leadership skills.
- Familiarity with Agile/DevOps delivery models.
Nice to Have:
- Experience building or governing platform engineering / IDP (Internal Developer Platform) capabilities.
- Multi-region or multi-cloud architecture experience.
- Experience with cost optimization (FinOps) at scale.
- Exposure to service mesh or advanced traffic management.
- Familiarity with ITSM platforms such as ServiceNow.
SYNECHRONS DIVERSITY & INCLUSION STATEMENT
Diversity & Inclusion are fundamental to our culture and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity Equity and Inclusion (DEI) initiative Same Difference is committed to fostering an inclusive culture promoting equality diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger successful businesses as a global company. We encourage applicants from across diverse backgrounds race ethnicities religion age marital status gender sexual orientations or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements mentoring internal mobility learning and development programs and more.
All employment decisions at Synechron are based on business needs job requirements and individual qualifications without regard to the applicants gender gender identity sexual orientation race ethnicity disabled or veteran status or any other characteristic protected by law.
Required Experience:
Staff IC
About Company
Chez Synechron, nous croyons en la puissance du numérique pour transformer les entreprises en mieux. Notre cabinet de conseil mondial combine la créativité et la technologie innovante pour offrir des solutions numériques de premier plan. Les technologies progressistes et les stratégie ... View more