Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs

Job Location

drjobs

Dallas - USA

Monthly Salary

drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Role: SRE

Location: Remote

Visa: GC/USC only



No DevOps candidates



The project team is actively looking to screen new profiles for the role of Site Reliability Engineers.

At least 12 years of experience defining and implementing Monitoring solutions alerts Telemetry and instrumentation for onpremises and cloud platforms for large enterprises

Site Reliability Engineer will be playing a key role in building Observability and Resilience capabilities on cloud platform (Azure). Responsibilities of the SRE will be:

Build and configure alerts tracing telemetry and instrumentation required for Infrastructure Monitoring and Application Performance Management.

Role entails implementing dashboards to monitor and share Observability at various levels (engineering teams portfolio senior management).

Support resilience engineering (application and infrastructure resilience) to meet availability requirements.

Work with development engineers cloud engineers product teams and support engineers to gather requirements implement and evolve observability and resilience solutions.



Key Skillsets :

Extensive knowledge on Observability and Application Performance Monitoring best practices KPIs/metrics on Cloud platforms

Experience in monitoring tools Dynatrace and Splunk

Experience with incident resolution (oncall support) application errors and performance troubleshooting using Dynatrace and Splunk to assist application team on root cause analysis

Experience working with SLO and Error budget understanding of SLA/SLI/SLO

Expertise with Splunk Query Language

Experience building monitoring solutions for containerbased workloads (Java / Spring boot desirable) databases Kafka and Kubernetes

Experience in resilience engineering and implementing high availability solutions

Experience creating Monitoring dashboards using Dynatrace and Splunk

Ability to work in a fast paced and agile environment



SRE Maturity Level 3 (Expectation) :

DevOps Observability

DORA Metrics are visible .

Deployment frequency Mean Time To Restore (MTTR) Cycle time Change failure rate

IaC (Infrastructure as Code)

Platforms leverage IaC .

Test / Release automation

Unit tests

Test in a vacuum

Integration tests

Load test results validated against SLOs .

Test run as part of CI/CD pipeline .

Automated rollback

Business Continuity Plan for Recovering Service(s)

Capacity planning review

Show saturation of service as compared to load test and production peak load .

Product Management (Security)

Security scanning

Documented procedures for Vulnerability Management

Integrated into CI/CD pipeline (partner with security)

SRE Maturity Level 4 (Advanced) :

Modernized application .

Deployment to Kubernetes Azure or SaaS via CI/CD pipeline

Synthetic Monitoring

Canary / Blue Green Deployment

SelfHealing

Auto scaling

Identify KPIs for business performance .

Chaos Engineering

SAAS,JAVA,AZURE,CI/CD

Employment Type

Full Time

Company Industry

Accounting & Auditing

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.