Cloud Operations Lead SRE / DevOps / Platform Engineering
Experience
912 Years
Shift
Overlap with US & EU Business Hours
Role Summary
We are seeking an experienced Cloud Operations Lead with a strong background in Site Reliability Engineering (SRE) DevOps and Platform Engineering. The ideal candidate will be responsible for ensuring the reliability security and operational excellence of cloud-based platforms and services while leading a small team of engineers.
This is a hands-on role with approximately 80% focus on Cloud Operations Production Support Reliability and Platform Ownership combined with leadership responsibilities.
Key Responsibilities
Lead cloud operations and production support activities across AWS-based platforms.
Manage and troubleshoot Linux systems cloud infrastructure networking and Kubernetes environments.
Drive operational excellence through monitoring observability automation and incident management.
Build and maintain Infrastructure as Code (IaC) using Terraform Ansible and Helm.
Support and optimize CI/CD pipelines using GitHub Actions Jenkins and deployment automation tools.
Design and implement monitoring alerting dashboards runbooks and operational standards.
Lead vulnerability remediation secrets management access governance and platform hardening initiatives.
Automate infrastructure provisioning OS/AMI upgrades and day-2 operational activities.
Support production deployments release management and change control processes.
Collaborate with engineering teams on onboarding platform readiness access management and operational best practices.
Mentor and guide junior engineers while driving continuous service improvement.
Required Skills (Non-Negotiable)
Strong Linux Administration and Troubleshooting
AWS Cloud Operations (IAM EC2 Networking EKS)
Kubernetes Administration and Production Support
Terraform and Infrastructure as Code
CI/CD Tools (GitHub Actions Jenkins)
Monitoring & Observability (Datadog Prometheus Grafana SignalFx Nagios or similar)
Incident Management Root Cause Analysis and Production Support
Security Operations including vulnerability remediation access management and secrets rotation
Experience working in enterprise environments with formal change management processes
Preferred Skills
DNS Proxy Edge Services and Networking Platforms
Teleport Bastion Hosts Service Accounts and Access Management Solutions
Container Security and Supply Chain Security
AMI/Image Lifecycle Management
AI-enabled Operations Custom Agentic AI or Hyperscaler AI Services
Leadership Expectations
Lead a team of cloud/platform engineers.
Drive operational governance service reliability and process standardization.
Promote automation-first and reliability-first engineering practices.
Partner with stakeholders across Cloud Infrastructure Security and Application teams.
Nice to Have
Experience in SRE Platform Engineering or Managed Services environments.
Exposure to AI-powered operations observability or automation solutions.
Experience supporting large-scale distributed systems and cloud-native applications.
Required Experience:
IC
Cloud Operations Lead SRE / DevOps / Platform EngineeringExperience912 YearsShiftOverlap with US & EU Business HoursRole SummaryWe are seeking an experienced Cloud Operations Lead with a strong background in Site Reliability Engineering (SRE) DevOps and Platform Engineering. The ideal candidate wil...
Cloud Operations Lead SRE / DevOps / Platform Engineering
Experience
912 Years
Shift
Overlap with US & EU Business Hours
Role Summary
We are seeking an experienced Cloud Operations Lead with a strong background in Site Reliability Engineering (SRE) DevOps and Platform Engineering. The ideal candidate will be responsible for ensuring the reliability security and operational excellence of cloud-based platforms and services while leading a small team of engineers.
This is a hands-on role with approximately 80% focus on Cloud Operations Production Support Reliability and Platform Ownership combined with leadership responsibilities.
Key Responsibilities
Lead cloud operations and production support activities across AWS-based platforms.
Manage and troubleshoot Linux systems cloud infrastructure networking and Kubernetes environments.
Drive operational excellence through monitoring observability automation and incident management.
Build and maintain Infrastructure as Code (IaC) using Terraform Ansible and Helm.
Support and optimize CI/CD pipelines using GitHub Actions Jenkins and deployment automation tools.
Design and implement monitoring alerting dashboards runbooks and operational standards.
Lead vulnerability remediation secrets management access governance and platform hardening initiatives.
Automate infrastructure provisioning OS/AMI upgrades and day-2 operational activities.
Support production deployments release management and change control processes.
Collaborate with engineering teams on onboarding platform readiness access management and operational best practices.
Mentor and guide junior engineers while driving continuous service improvement.
Required Skills (Non-Negotiable)
Strong Linux Administration and Troubleshooting
AWS Cloud Operations (IAM EC2 Networking EKS)
Kubernetes Administration and Production Support
Terraform and Infrastructure as Code
CI/CD Tools (GitHub Actions Jenkins)
Monitoring & Observability (Datadog Prometheus Grafana SignalFx Nagios or similar)
Incident Management Root Cause Analysis and Production Support
Security Operations including vulnerability remediation access management and secrets rotation
Experience working in enterprise environments with formal change management processes
Preferred Skills
DNS Proxy Edge Services and Networking Platforms
Teleport Bastion Hosts Service Accounts and Access Management Solutions
Container Security and Supply Chain Security
AMI/Image Lifecycle Management
AI-enabled Operations Custom Agentic AI or Hyperscaler AI Services
Leadership Expectations
Lead a team of cloud/platform engineers.
Drive operational governance service reliability and process standardization.
Promote automation-first and reliability-first engineering practices.
Partner with stakeholders across Cloud Infrastructure Security and Application teams.
Nice to Have
Experience in SRE Platform Engineering or Managed Services environments.
Exposure to AI-powered operations observability or automation solutions.
Experience supporting large-scale distributed systems and cloud-native applications.