Job Description
SUMMARY OF JOB PURPOSE
The DevOps Site Reliability Engineer (SRE) possesses a strong background in deploying and managing infrastructure using modern DevOps practices with expertise in Kubernetes Terraform and observability and monitoring platforms such as DataDog. The DevOps SRE works closely with the development and Operations team to ensure the reliability scalability and performance of our systems.
PRIMARY JOB RESPONSIBILITIES
- Designs deploys and maintains cloud infrastructure using Kubernetes and Terraform ensuring scalability reliability and performance.
- Collaborates with development teams to implement CI/CD pipelines and automate deployment processes.
- Monitors system performance troubleshoots issues and implements solutions to optimize performance and ensure uptime.
- Develops and maintains monitoring and alerting systems using observability tools such as DataDog.
- Implements and manages microservices architectures ensuring seamless communication and scalability.
- Troubleshoots and resolves issues related to infrastructure deployments and performance ensuring high availability and reliability of our systems.
- Stays updated on emerging technologies and industry trends and incorporate them into our infrastructure and practices where applicable.
- Participates in oncall rotation to address issues and incidents during weekdays ensuring system reliability and availability.
- Collaborates closely with all other members of the team to take shared responsibility for the overall efforts that the team has committed to for each sprint.
- Establishes and maintains positive working relationships with other members of the organization across departments divisions and locations.
- Maintains the confidentiality of proprietary and sensitive information exercising sound judgment and discretion in any disclosure of information related to EM and its endeavors.
- Upholds the values of Engle Martin and Our Foundation.
REQUIRED EDUCATION & EXPERIENCE
- Bachelors degree in computer science engineering or a related field or equivalent work experience
- At least 35 years of experience in a DevOps role required with experience as a Site Reliability Engineering preferred
- Prior experience with cloud platforms such as AWS Azure or Google Cloud Platform (GCP)
- Prior experience with observability and monitoring platforms such as DataDog Dynatrace or Splunk
- Certification in relevant cloud technologies preferred (e.g. AWS Certified DevOps Engineer Certified Kubernetes Administrator)
- Prior experience with Azure AKS preferred
- Experience with other DevOps tools and technologies such as Azure DevOps Jenkins GitLab CI/CD etc. preferred
DESIRED KNOWLEDGE SKILLS & ABILITIES
- Strong proficiency in Kubernetes and Terraform for managing and deploying infrastructure
- Solid understanding of microservices architecture and experience in deploying and managing microservicesbased systems
- Proficiency in scripting languages such as Python Shell or Bash for automation tasks
- Familiarity with Agile methodologies and practices
- Knowledge of security best practices for cloud environments
- Excellent problemsolving skills and ability to troubleshoot complex issues in distributed systems
- Strong communication and collaboration skills with the ability to work effectively across teams in a fastpaced agile environment
- Willingness to participate in an oncall rotation to address issues during weekdays
- Commitment to professional and personal growth and development
WORKING CONDITIONS
Work is conducted primarily in an indoor office environment with protection from weather conditions and with exposure to noise typical of an office or administrative setting.
PHYSICAL ACTIVITIES AND REQUIREMENTS
Lifting and carrying up to 20 lbs.; Frequent sitting standing walking and bending; occasional kneeling reaching and stooping; handling office equipment; periodic driving may be required; visual acuity to prepare read and organize detailed hard copy and electronic documents; ability to speak and to hear the spoken word in normal facetoface webbased and telephonic business communications. Willingness to travel in a work capacity including occasional evening overnight and weekend hours. Willingness to accommodate occasional meetings and work activities that may be scheduled after normal daytime business hours.