drjobs
Site Reliability Manager
drjobs
Site Reliability Man....
drjobs Site Reliability Manager العربية

Site Reliability Manager

Employer Active

1 Vacancy
The job posting is outdated and position may be filled
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs

Job Location

drjobs

Vi - Sweden

Monthly Salary

drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Site Reliability Engineer (SRE)

Only GC / USC / GCEAD

Site Reliability Engineer (SRE)

Job Description

Key Responsibilities:

At least 12 years of experience defining and implementing Monitoring solutions alerts Telemetry and instrumentation for onpremises and cloud platforms for large enterprises

Site Reliability Engineer will be playing a key role in building Observability and Resilience capabilities on cloud platform (Azure). Responsibilities of the SRE will be:

Build and configure alerts tracing telemetry and instrumentation required for Infrastructure Monitoring and Application Performance Management.

Role entails implementing dashboards to monitor and share Observability at various levels (engineering teams portfolio senior management).

Support resilience engineering (application and infrastructure resilience) to meet availability requirements.

Work with development engineers cloud engineers product teams and support engineers to gather requirements implement and evolve observability and resilience solutions.

Key Skillsets :

Good knowledge on Observability and Application Performance Monitoring best practices KPIs/metrics on Cloud platforms

Experience in monitoring tools such as Splunk Dyna Trace Prometheus Cloud Watch Azure Monitor New Relic other opensource tools.

Experience building monitoring solutions for variety of workloads such as Micro services (Java / Spring boot desirable) databases Kafka Kubernetes

Experience in resilience engineering and implementing high availability solutions

Experience creating Monitoring dashboards using tools such as Grafana (Preferred) Splunk Kibana Power BI

Ability to work in a fast paced and agile environment

SRE Maturity Level 3 (Expectation)

DevOps Observability

o DORA Metrics are visible

Deployment frequency Mean Time To Restore (MTTR) Cycle time Change failure rate

IaC (Infrastructure as Code)

o Platforms leverage IaC

Test / Release automation

o Unit tests

Test in a vacuum

o Integration tests

o Load test results validated against SLOs

o Test run as part of CI/CD pipeline

o Automated rollback

o Business Continuity Plan for Recovering Service(s)

Capacity planning review

o Show saturation of service as compared to load test and production peak load

Product Management (Security)

o Security scanning

o Documented procedures for Vulnerability Management

o Integrated into CI/CD pipeline (partner with security)

SRE Maturity Level 4 (Advanced)

Modernized application

o Deployment to Kubernetes Azure or SaaS via CI/CD pipeline

Synthetic Monitoring

Canary / Blue Green Deployment

SelfHealing

Auto scaling

Identify KPIs for business performance

Chaos Engineering

Enterprise Process TieIns

Problem management will as part of RCA will review the maturity level of the incident owner

Employment Type

Full Time

Company Industry

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.