Cloud SRE Architect | AWS, Kubernetes, Infrastructure as Code, Observability, Reliability Frameworks

Synechron


Job Location:

Bengaluru - India

Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Job Summary
Synechron is seeking an experienced Cloud SRE Architect to lead the strategy design and implementation of scalable resilient and secure cloud platform solutions. This role involves establishing enterprise-wide reliability standards managing large-scale cloud infrastructure and fostering a culture of automation observability and continuous improvement. The ideal candidate will drive operational excellence oversee incident response processes and guide cross-functional teams to ensure high system availability and performance at scale.

Software Requirements

  • Required:

    • In-depth knowledge of cloud platforms such as AWS Azure or GCP with extensive experience in core services including ECS/Fargate EKS/Kubernetes EC2 S3 Auto Scaling and VPCs

    • Proven experience in designing and operating container platforms (ECS Kubernetes)

    • Strong understanding of Infrastructure as Code (Terraform CloudFormation)

    • Expertise in monitoring logging and observability tools such as Prometheus Grafana Datadog Splunk or Dynatrace

    • Solid experience in implementing security best practices like IAM least-privilege policies and cloud guardrails

    • Automation and scripting proficiency using Python Bash or similar

  • Preferred:

    • Experience with multi-region multi-cloud architectures

    • Knowledge of service mesh architectures and advanced traffic management techniques

    • Familiarity with cost optimization (FinOps) practices in cloud environments

    • Experience with ITSM platforms such as ServiceNow

Overall Responsibilities

  • Define and drive enterprise-wide cloud reliability strategies standards and reference architectures

  • Architect and evolve highly available scalable cloud infrastructure and platforms ensuring they meet security compliance and performance benchmarks

  • Lead design and governance of SLI/SLO frameworks error budgets and KPIs across cloud services and microservices ecosystems

  • Establish and mature incident management processes including incident response post-incident reviews and operational readiness

  • Develop and implement observability architectures including metrics logs traces and synthetic monitoring tools

  • Partner with security teams to define and enforce cloud security models access controls and audit policies

  • Promote automation in provisioning deployment and operational tasks reducing manual efforts and operational risks

  • Mentor engineering teams on resilience patterns such as multi-AZ multi-region deployments and graceful degradation

  • Influence platform evolution and ensure infrastructure aligns with organizational cloud roadmap and scalability targets

  • Act as escalation point for complex production issues and systemic reliability risks

Technical Skills (By Category)

  • Cloud Technologies:
    AWS (ECS/Fargate EKS S3 EC2 VPC IAM) Azure GCP core services for deployment scaling and security

  • Infrastructure as Code:
    Terraform CloudFormation or similar tools for automation and resource management

  • Containerization & Orchestration:
    Docker Kubernetes (EKS or alternative) for containerized workloads

  • Monitoring & Observability:
    Prometheus Grafana Datadog Splunk Dynatrace for system health performance and troubleshooting

  • Security & Compliance:
    Implementation of least-privilege policies encryption security guardrails and compliance with industry standards

  • Automation & Scripting:
    Proficient in Python Bash or PowerShell for automation tasks and system scripting

Experience Requirements

  • Minimum of 15 years of experience in Site Reliability Engineering Platform Engineering or cloud infrastructure roles

  • Proven expertise in designing deploying and managing large-scale cloud platforms with high availability and security standards

  • Extensive hands-on experience with AWS services like ECS EKS S3 IAM CloudFormation and auto-scaling solutions

  • Demonstrated leadership in incident management operational readiness and reliability governance

  • Experience with multi-cloud and multi-region architectures is desirable

  • Proven ability to lead cross-functional teams across DevOps security and product areas

Day-to-Day Activities

  • Develop and enforce cloud reliability frameworks standards and best practices across enterprise platforms

  • Architect and optimize cloud infrastructure ensuring high availability security and scalability

  • Lead incident response efforts root cause analysis and post-incident reviews for systemic issues

  • Monitor system health through observability tools automate recovery and scaling processes and improve system resilience

  • Collaborate with product security and engineering teams to implement automation security guardrails and cost management strategies

  • Influence platform roadmaps and technical strategies aligned with enterprise objectives

  • Provide escalation support for complex outages and systemic reliability concerns

Qualifications

  • Bachelors or Masters degree in Computer Science Information Technology or related field

  • Certifications such as AWS Certified Solutions Architect Azure Solutions Architect or GCP Professional Cloud Architect are preferred

  • Extensive experience in cloud platform architecture automation and high-availability systems in large enterprise environments

  • Proven leadership in reliability engineering incident management and operational excellence in cloud environments

Professional Competencies

  • Strong analytical and troubleshooting skills for complex systemic issues

  • Excellent communication skills for engaging stakeholders and cross-team collaboration

  • Leadership capabilities to guide and mentor engineering teams on best practices

  • Strategic thinking aligned with enterprise cloud roadmap and operational goals

  • Ability to adapt quickly to technological advancements and evolving operational needs

  • Effective time and project management skills to handle multiple priorities with precision

SYNECHRONS DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity Equity and Inclusion (DEI) initiative Same Difference is committed to fostering an inclusive culture promoting equality diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger successful businesses as a global company. We encourage applicants from across diverse backgrounds race ethnicities religion age marital status gender sexual orientations or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements mentoring internal mobility learning and development programs and more.


All employment decisions at Synechron are based on business needs job requirements and individual qualifications without regard to the applicants gender gender identity sexual orientation race ethnicity disabled or veteran status or any other characteristic protected by law.

Candidate Application Notice


Required Experience:

Staff IC

Job SummarySynechron is seeking an experienced Cloud SRE Architect to lead the strategy design and implementation of scalable resilient and secure cloud platform solutions. This role involves establishing enterprise-wide reliability standards managing large-scale cloud infrastructure and fostering a...

About Company

Company Logo

Chez Synechron, nous croyons en la puissance du numérique pour transformer les entreprises en mieux. Notre cabinet de conseil mondial combine la créativité et la technologie innovante pour offrir des solutions numériques de premier plan. Les technologies progressistes et les stratégie ... View more

View Profile View Profile