Site Reliability Engineer (SRE)

Cleo Consulting

Not Interested
Bookmark
Report This Job

profile Job Location:

New York City, NY - USA

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Job Title: Site Reliability Engineer (SRE)
Location: New York/EST area Hybrid/Remote
Duration: 6 Months (Can be extendable)
Experience Required: 10 Years
Certification: AWS Certification is Mandatory (e.g. AWS Certified DevOps Engineer Solutions Architect or SysOps Administrator)

Job Summary:

  • We are seeking an experienced Senior Site Reliability Engineer (SRE) with a strong background in AWS Cloud DevOps automation and system reliability engineering. The ideal candidate should bring hands-on expertise in cloud infrastructure CI/CD monitoring and automation along with proven experience in supporting large-scale high-availability systems within Telecom Banking or Retail industries.
  • This role is responsible for ensuring platform stability reliability scalability and continuous improvement of infrastructure through automation and DevOps best practices.

Key Responsibilities:

  • Design implement and maintain highly available and scalable cloud infrastructure on AWS.
  • Build and manage end-to-end CI/CD pipelines to enable efficient and reliable software delivery.
  • Develop and maintain Infrastructure as Code (IaC) using Terraform CloudFormation or Ansible.
  • Monitor automate and enhance system reliability performance and incident response processes.
  • Implement observability solutions (Prometheus Grafana ELK/EFK Splunk or Datadog).
  • Collaborate with development teams to improve application reliability and deployment processes.
  • Participate in on-call rotations incident management and root cause analysis (RCA).
  • Optimize infrastructure costs and ensure cloud security and compliance with enterprise standards.
  • Develop automation scripts and tools using Python Go or Shell to eliminate manual tasks.
  • Prepare and maintain documentation including architecture diagrams and operational runbooks.

Primary Skills:

  • Cloud Platform: AWS (EC2 S3 EKS Lambda CloudWatch RDS IAM etc.)
  • DevOps & SRE Practices: CI/CD automation monitoring incident response performance tuning
  • Infrastructure as Code (IaC): Terraform AWS CloudFormation Ansible
  • CI/CD Tools: Jenkins GitLab CI GitHub Actions Argo CD or Spinnaker
  • Containers & Orchestration: Docker Kubernetes Helm EKS OpenShift
  • Monitoring & Logging: Prometheus Grafana ELK / EFK Splunk Datadog CloudWatch
  • Scripting / Programming: Python Go Bash or Shell
  • Version Control: Git GitHub Bitbucket
  • Networking & Security: VPC VPN Load Balancers DNS SSL Security Groups IAM

Required Qualifications:

  • Bachelors or Masters degree in Computer Science Information Technology or related field.
  • 10 years of hands-on experience in SRE / DevOps / Cloud Infrastructure roles.
  • Mandatory AWS Certification (e.g. AWS Certified DevOps Engineer Solutions Architect Associate/Professional or SysOps Administrator).
  • Proven experience in Telecom Banking or Retail domain infrastructure and platform operations.
  • Strong expertise in microservices distributed systems and containerized environments.
  • Experience in monitoring alerting observability and automated remediation.
  • Excellent problem-solving incident management and communication skills.

Preferred / Nice-to-Have:

  • Experience with Kafka RabbitMQ or other messaging platforms.
  • Familiarity with service mesh (Istio Linkerd Consul) and API Gateway solutions.
  • Exposure to data pipeline management and streaming frameworks.
  • Knowledge of FinOps and cost optimization strategies on AWS.
  • Experience with security compliance frameworks (ISO PCI-DSS GDPR etc.).

Job Title: Site Reliability Engineer (SRE) Location: New York/EST area Hybrid/Remote Duration: 6 Months (Can be extendable) Experience Required: 10 Years Certification: AWS Certification is Mandatory (e.g. AWS Certified DevOps Engineer Solutions Architect or SysOps Administrator) Job Summary: We a...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting