Job Title: Site Reliability Engineer
Location: Morris Plains NJ (Onsite)
Employment Type: Long Term Contract
Role Overview
Were seeking a skilled Site Reliability Engineer (SRE) to join our team in Morris Plains this role youll be instrumental in driving system resilience reliability and performance across complex cloud environments. Youll work cross-functionally with engineering operations and DevOps teams to build observability automate infrastructure and uphold availability and latency standards.
Must-Have Skills
-
Strong experience with monitoring tools: Dynatrace Splunk CloudWatch
-
Deep understanding of AWS technologies: EKS ECS EC2 DynamoDB RDS etc.
-
Hands-on expertise in building and evaluating SLIs/SLOs/SLAs
-
Ability to drive SRE metrics and KPIs with actionable insights
-
Proficient in Python Java or Golang
-
Solid grasp of YAML Containers and cloud-native infrastructure
-
Familiarity with modern DevOps practices and scalable system design
Key Responsibilities
-
Develop and maintain monitoring dashboards using Dynatrace Splunk and CloudWatch
-
Optimize reliability across AWS services including EKS ECS EC2 DynamoDB RDS
-
Define and implement SLIs SLOs and SLAs that align with business goals
-
Drive key KPIs for availability performance and incident response
-
Apply and promote engineering best practices for code quality deployment and scalability
-
Build and manage containerized applications using Docker and Kubernetes
-
Design automation scripts and tools with Python Java or Golang
-
Configure system infrastructure using YAML-based declarative configurations
-
Collaborate with cross-functional teams to troubleshoot mitigate incidents and improve time to resolution