Cloud SRE Architect | AWS, Kubernetes, Infrastructure as Code, Observability, Reliability Frameworks

Synechron

Job Location:

Bengaluru - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Summary
Synechron is seeking an experienced Cloud SRE Architect to lead the strategy design and implementation of scalable resilient and secure cloud platform solutions. This role involves establishing enterprise-wide reliability standards managing large-scale cloud infrastructure and fostering a culture of automation observability and continuous improvement. The ideal candidate will drive operational excellence oversee incident response processes and guide cross-functional teams to ensure high system availability and performance at scale.

Software Requirements

Required:
- In-depth knowledge of cloud platforms such as AWS Azure or GCP with extensive experience in core services including ECS/Fargate EKS/Kubernetes EC2 S3 Auto Scaling and VPCs
- Proven experience in designing and operating container platforms (ECS Kubernetes)
- Strong understanding of Infrastructure as Code (Terraform CloudFormation)
- Expertise in monitoring logging and observability tools such as Prometheus Grafana Datadog Splunk or Dynatrace
- Solid experience in implementing security best practices like IAM least-privilege policies and cloud guardrails
- Automation and scripting proficiency using Python Bash or similar

Preferred:
- Experience with multi-region multi-cloud architectures
- Knowledge of service mesh architectures and advanced traffic management techniques
- Familiarity with cost optimization (FinOps) practices in cloud environments
- Experience with ITSM platforms such as ServiceNow

Overall Responsibilities

Define and drive enterprise-wide cloud reliability strategies standards and reference architectures
Architect and evolve highly available scalable cloud infrastructure and platforms ensuring they meet security compliance and performance benchmarks
Lead design and governance of SLI/SLO frameworks error budgets and KPIs across cloud services and microservices ecosystems
Establish and mature incident management processes including incident response post-incident reviews and operational readiness
Develop and implement observability architectures including metrics logs traces and synthetic monitoring tools
Partner with security teams to define and enforce cloud security models access controls and audit policies
Promote automation in provisioning deployment and operational tasks reducing manual efforts and operational risks
Mentor engineering teams on resilience patterns such as multi-AZ multi-region deployments and graceful degradation
Influence platform evolution and ensure infrastructure aligns with organizational cloud roadmap and scalability targets
Act as escalation point for complex production issues and systemic reliability risks

Technical Skills (By Category)

Cloud Technologies:
AWS (ECS/Fargate EKS S3 EC2 VPC IAM) Azure GCP core services for deployment scaling and security

Infrastructure as Code:
Terraform CloudFormation or similar tools for automation and resource management

Containerization & Orchestration:
Docker Kubernetes (EKS or alternative) for containerized workloads

Monitoring & Observability:
Prometheus Grafana Datadog Splunk Dynatrace for system health performance and troubleshooting

Security & Compliance:
Implementation of least-privilege policies encryption security guardrails and compliance with industry standards

Automation & Scripting:
Proficient in Python Bash or PowerShell for automation tasks and system scripting

Experience Requirements

Minimum of 15 years of experience in Site Reliability Engineering Platform Engineering or cloud infrastructure roles
Proven expertise in designing deploying and managing large-scale cloud platforms with high availability and security standards
Extensive hands-on experience with AWS services like ECS EKS S3 IAM CloudFormation and auto-scaling solutions
Demonstrated leadership in incident management operational readiness and reliability governance
Experience with multi-cloud and multi-region architectures is desirable
Proven ability to lead cross-functional teams across DevOps security and product areas

Day-to-Day Activities

Develop and enforce cloud reliability frameworks standards and best practices across enterprise platforms
Architect and optimize cloud infrastructure ensuring high availability security and scalability
Lead incident response efforts root cause analysis and post-incident reviews for systemic issues
Monitor system health through observability tools automate recovery and scaling processes and improve system resilience
Collaborate with product security and engineering teams to implement automation security guardrails and cost management strategies
Influence platform roadmaps and technical strategies aligned with enterprise objectives
Provide escalation support for complex outages and systemic reliability concerns

Qualifications

Bachelors or Masters degree in Computer Science Information Technology or related field
Certifications such as AWS Certified Solutions Architect Azure Solutions Architect or GCP Professional Cloud Architect are preferred
Extensive experience in cloud platform architecture automation and high-availability systems in large enterprise environments
Proven leadership in reliability engineering incident management and operational excellence in cloud environments

Professional Competencies

Strong analytical and troubleshooting skills for complex systemic issues
Excellent communication skills for engaging stakeholders and cross-team collaboration
Leadership capabilities to guide and mentor engineering teams on best practices
Strategic thinking aligned with enterprise cloud roadmap and operational goals
Ability to adapt quickly to technological advancements and evolving operational needs
Effective time and project management skills to handle multiple priorities with precision

SYNECHRONS DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity Equity and Inclusion (DEI) initiative Same Difference is committed to fostering an inclusive culture promoting equality diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger successful businesses as a global company. We encourage applicants from across diverse backgrounds race ethnicities religion age marital status gender sexual orientations or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements mentoring internal mobility learning and development programs and more.

All employment decisions at Synechron are based on business needs job requirements and individual qualifications without regard to the applicants gender gender identity sexual orientation race ethnicity disabled or veteran status or any other characteristic protected by law.

Candidate Application Notice

Required Experience:

Staff IC

Job SummarySynechron is seeking an experienced Cloud SRE Architect to lead the strategy design and implementation of scalable resilient and secure cloud platform solutions. This role involves establishing enterprise-wide reliability standards managing large-scale cloud infrastructure and fostering a...

Software Requirements

Required:
- In-depth knowledge of cloud platforms such as AWS Azure or GCP with extensive experience in core services including ECS/Fargate EKS/Kubernetes EC2 S3 Auto Scaling and VPCs
- Proven experience in designing and operating container platforms (ECS Kubernetes)
- Strong understanding of Infrastructure as Code (Terraform CloudFormation)
- Expertise in monitoring logging and observability tools such as Prometheus Grafana Datadog Splunk or Dynatrace
- Solid experience in implementing security best practices like IAM least-privilege policies and cloud guardrails
- Automation and scripting proficiency using Python Bash or similar

Preferred:
- Experience with multi-region multi-cloud architectures
- Knowledge of service mesh architectures and advanced traffic management techniques
- Familiarity with cost optimization (FinOps) practices in cloud environments
- Experience with ITSM platforms such as ServiceNow

Overall Responsibilities

Define and drive enterprise-wide cloud reliability strategies standards and reference architectures
Architect and evolve highly available scalable cloud infrastructure and platforms ensuring they meet security compliance and performance benchmarks
Lead design and governance of SLI/SLO frameworks error budgets and KPIs across cloud services and microservices ecosystems
Establish and mature incident management processes including incident response post-incident reviews and operational readiness
Develop and implement observability architectures including metrics logs traces and synthetic monitoring tools
Partner with security teams to define and enforce cloud security models access controls and audit policies
Promote automation in provisioning deployment and operational tasks reducing manual efforts and operational risks
Mentor engineering teams on resilience patterns such as multi-AZ multi-region deployments and graceful degradation
Influence platform evolution and ensure infrastructure aligns with organizational cloud roadmap and scalability targets
Act as escalation point for complex production issues and systemic reliability risks

Technical Skills (By Category)

Cloud Technologies:
AWS (ECS/Fargate EKS S3 EC2 VPC IAM) Azure GCP core services for deployment scaling and security

Infrastructure as Code:
Terraform CloudFormation or similar tools for automation and resource management

Containerization & Orchestration:
Docker Kubernetes (EKS or alternative) for containerized workloads

Monitoring & Observability:
Prometheus Grafana Datadog Splunk Dynatrace for system health performance and troubleshooting

Security & Compliance:
Implementation of least-privilege policies encryption security guardrails and compliance with industry standards

Automation & Scripting:
Proficient in Python Bash or PowerShell for automation tasks and system scripting

Experience Requirements

Minimum of 15 years of experience in Site Reliability Engineering Platform Engineering or cloud infrastructure roles
Proven expertise in designing deploying and managing large-scale cloud platforms with high availability and security standards
Extensive hands-on experience with AWS services like ECS EKS S3 IAM CloudFormation and auto-scaling solutions
Demonstrated leadership in incident management operational readiness and reliability governance
Experience with multi-cloud and multi-region architectures is desirable
Proven ability to lead cross-functional teams across DevOps security and product areas

Day-to-Day Activities

Develop and enforce cloud reliability frameworks standards and best practices across enterprise platforms
Architect and optimize cloud infrastructure ensuring high availability security and scalability
Lead incident response efforts root cause analysis and post-incident reviews for systemic issues
Monitor system health through observability tools automate recovery and scaling processes and improve system resilience
Collaborate with product security and engineering teams to implement automation security guardrails and cost management strategies
Influence platform roadmaps and technical strategies aligned with enterprise objectives
Provide escalation support for complex outages and systemic reliability concerns

Qualifications

Bachelors or Masters degree in Computer Science Information Technology or related field
Certifications such as AWS Certified Solutions Architect Azure Solutions Architect or GCP Professional Cloud Architect are preferred
Extensive experience in cloud platform architecture automation and high-availability systems in large enterprise environments
Proven leadership in reliability engineering incident management and operational excellence in cloud environments

Professional Competencies

Strong analytical and troubleshooting skills for complex systemic issues
Excellent communication skills for engaging stakeholders and cross-team collaboration
Leadership capabilities to guide and mentor engineering teams on best practices
Strategic thinking aligned with enterprise cloud roadmap and operational goals
Ability to adapt quickly to technological advancements and evolving operational needs
Effective time and project management skills to handle multiple priorities with precision

SYNECHRONS DIVERSITY & INCLUSION STATEMENT

Candidate Application Notice

Required Experience:

Staff IC

Apply Now

About Company

Synechron

Chez Synechron, nous croyons en la puissance du numérique pour transformer les entreprises en mieux. Notre cabinet de conseil mondial combine la créativité et la technologie innovante pour offrir des solutions numériques de premier plan. Les technologies progressistes et les stratégie ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click