SRE Director Job Description We are looking for a leader for our Site Reliability Engineering (SRE) Observability team. As a leader of SRE/Observability you will create compelling Offerings in SRE Observability and Resiliency for customers and contribute to the business growth. Deliver solutions to our customers and maintain the highest standards and develop and implement Observability and SRE team and offerings for Virtusa. Be a strong thought leader in Site Reliability engineering Observability Operational excellence and DevOps Principles. Strong technical acumen in Cloud Architecture Observability Performance Benchmarking Capacity planning and Reliability tools. Experience in Observability platforms application monitoring tools and performance analysis techniques. Experience managing & growing technical leaders and teams. Be responsible for building and mentoring a new team of SRE Observability specialists Strong technical acumen in Cloud Architecture Observability Performance Benchmarking Capacity planning and Reliability tools. KEY QUALIFICATION & EXPERIENCES: 15 yrs of IT experience with minimum 5 years of experience in SRE/Observability/Monitoring tools Bachelors or Masters degree in Computer Science Computer Engineering or a related field. Expert level experience in monitoring and logging technologies both open source and closed source (e.g. AppDynamics Newrelic Datadog Prometheus Grafana LogicMonitor SumoLogic ELK) Experience in implementing Metrics Logs and Tracing for E2E observability A working knowledge of systems is needed. Terraform Ansible Chef Puppet Jenkins Designing and implementing CI/CD pipelines Infrastructure provisioning and management Ability to communicate and coordinate with crossfunctional engineering teams across multiple geographic regions. Experience with AIOps and machine learning is highly desirable. Experience with other monitoring tools like Prometheus Grafana etc. Experience with Observability solutions like Dynatrace DataDog Instana etc. is highly desirable Excellent problemsolving and analytical skills. Strong communication and collaboration skills. Ability to work independently and manage multiple projects simultaneously. Knowledge of IT operations concepts and processes such as monitoring incident management root cause analysis remediation. |