IT Resiliency Engineer

Appsierra

Not Interested
Bookmark
Report This Job

profile Job Location:

Pune - India

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Tasks

End-to-End Engineering Leadership: Oversee the design and implementation of resilient engineering across the technology domains.

Cloud and On-Premises Infrastructure Expertise: Design and review resilient solutions in both cloud-based and on-premises environments.

Chaos Engineering Infrastructure Initiatives: Lead chaos engineering efforts to proactively identify and mitigate potential system weaknesses.

Standards for Monitoring and Alerting: Collaborate with Teams to evolve existing standards for system monitoring and alerting to ensure rapid detection and response.

Resiliency Architecture Reviews: Represent the IT Resiliency Office during the Architectural Review Board.

Enterprise-wide Collaboration and stakeholder management: Collaborate with various teams across the organization to align and prioritize resiliency and recovery efforts.

Automation: Expertise with IaC and Tools such as Ansible.

Incident Response and Recovery: Integrate with post mortem process from a major incident to identify areas of opportunity for enhancing resiliency.

Development: Evangelize standards and practices among the Technology organization to enrich our resiliency posture.

Reporting and Documentation: Develop standardized regular reporting on resilience activities risks and improvements to the Leadership team.

Requirements

Qualifications:

  • Bachelors degree or equivalent experience.
  • 5-10 years experience with platform engineering with a focus on IaC DevOps practices and orchestration tools.
  • Preferred but not required experience as a Team lead or a hands on Technical Manager role that can engage and deliver projects to completion
  • A track record of successfully architecting and deploying enterprise-level solutions that prioritize system uptime and data integrity across various operational scenarios.
  • Demonstrated ability to design and implement systems that ensure high availability support massive transaction volumes and facilitate seamless disaster recovery processes.
  • Infrastructure and service architecture & engineering experience including functional and technical requirements gathering and solution development.
  • Strong dedication to customer needs with excellent communication and the ability to build lasting relationships alongside the capability to articulate complex resilience strategies in a clear and impactful manner.
  • Deep insight into the complexities of multi-AZ and multi-Region cloud platforms with a keen understanding of how these impact system resilience and disaster recovery planning.
  • Proven experience in the ongoing management of mission-critical systems that require constant uptime including out-of-hours support and rapid response to incidents.
  • Knowledgeable in evaluating and deciding on trade-offs between consistency availability and partition tolerance especially in the context of system failures and recovery strategies.
  • Well-versed in various cloud service models such as SaaS PaaS and IaaS with hands-on experience in designing resilient services on leading public cloud platforms.
  • Proficient in Chaos Engineering principles and practices with experience in designing and conducting experiments to validate the systems capability to withstand turbulent conditions.
  • Skilled in implementing observability solutions that provide real-time insights into the performance and health of systems aiding in proactive issue detection and resolution.
  • Practical experience operating in an Agile development environment.
TasksEnd-to-End Engineering Leadership: Oversee the design and implementation of resilient engineering across the technology domains. Cloud and On-Premises Infrastructure Expertise: Design and review resilient solutions in both cloud-based and on-premises environments. Chaos Engineering Infrastruc...
View more view more

Key Skills

  • Dhcp
  • Active Directory
  • VMware
  • Computer Networking
  • PowerShell
  • Microsoft Windows Server
  • Windows
  • Microsoft Exchange
  • SAN
  • Azure
  • Operating Systems
  • Dns