drjobs Lead SRE

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Reston, VA - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

This is a Lead SRE role.

Location: Reston VA

Need really good senior resources

Description:

We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in cloud platforms DevOps practices and modern software development frameworks. The SRE will play a critical role in designing building and maintaining highly scalable fault-tolerant and secure cloud infrastructure while ensuring operational excellence high availability and reliability.

Key Responsibilities:

1. Cloud Infrastructure & Automation:

  • Design implement and manage cloud-based infrastructure using platforms like AWS Azure or GCP.
  • Utilize Infrastructure-as-Code (IaC) tools such as Terraform CloudFormation and Ansible to automate deployments and configurations.
  • Create robust automation targeted at anomaly detection toil reduction recovery processes and self-healing mechanisms and optimize cloud costs.

2. DevSecOps & CI/CD:

  • Deep understanding of DevSecOps principles and CI/CD pipelines using tools like GitLab Jenkins SonarQube Nexus/Artifactory and Docker.
  • Implement security best practices including IAM roles RBAC vulnerability remediation and SAST/DAST/SCA tools.

3. Observability & Incident Management:

  • Design and implement monitoring logging and distributed tracing solutions using tools like AWS CloudWatch Splunk/SignalFX Dynatrace and OpenTelemetry.
  • Lead root cause analysis blameless postmortems and proactive incident management to minimize MTTR and MTTD.
  • Define and monitor SLOs SLIs and error budgets to ensure system reliability.

4. Microservices & API Management:

  • Architect and manage microservices serverless computing and RESTful APIs.
  • Ensure fault tolerance and resilience using design patterns like Circuit Breaker Retry Timeout and Bulkhead.

5. Chaos Engineering & Resiliency:

  • Conduct chaos engineering experiments using tools like AWS FIS and Chaos Toolkit.
  • Perform resiliency assessments using Resilience Hub and implement self-healing solutions.

6. Database & Application Support:

  • Manage and optimize database technologies such as PostgreSQL MongoDB DynamoDB Oracle and Redshift.
  • Provide production support including incident response problem management and runbook creation. Participate in on-call rotations.

7. Collaboration & Communication:

  • Collaborate with cross-functional teams to implement shift-left testing practices (BDD TDD Unit Regression).
  • Create and maintain architecture diagrams knowledge articles and disaster recovery plans.
  • Communicate effectively with stakeholders and demonstrate strong relationship management skills.

Required Skills & Qualifications:
Expertise in cloud platforms (AWS Azure or GCP) and container orchestration.
Proficiency in programming/scripting languages such as Python Java Bash and PowerShell.
Strong knowledge of database technologies (e.g. PostgreSQL MongoDB DynamoDB Oracle Redshift).
Experience with DevOps tools (Jenkins Docker Nexus/Artifactory) and build tools (Maven Gradle).
Familiarity with AI/ML integrations event-driven architectures and distributed systems.
Expertise in observability logging and monitoring tools (AWS CloudWatch Splunk Dynatrace OpenTelemetry).
Strong understanding of security practices including IAM RBAC and vulnerability management.
Experience with chaos engineering resiliency assessments and disaster recovery planning.
Proficiency in performance testing tools (JMeter LoadRunner) and capacity planning.
Excellent verbal and written communication skills with the ability to collaborate across teams.
8 years of related experience in their specific area with experience leading teams on projects with similar scope and complexity.
Bachelor s or master s degree in computer science or equivalent.
Certifications: AWS Solutions Architect Agile Certified Practitioner (ACP) or relevant cloud certifications.

Preferred Qualifications:

  • Experience with AI/ML libraries (e.g. NLTK Transformers Spacy SciPy) Amazon SageMaker and GenAI tools.
  • Familiarity with project management tools like JIRA Confluence and ServiceNow.
  • Knowledge of utilities like AWS CLI POSTMAN and curl.

Employment Type

Full-time

Company Industry

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.