The Cloud Engineer will build operate and improve Kubernetesbased platforms supporting Rhapsodys cloud services. You will own cluster reliability GitOpsdriven deployments (Argo CD or similar) infrastructureascode with Terraform modules and production monitoring using Grafanastyle dashboards. Youll collaborate with SRE Security and Engineering to deliver resilient observable and costaware services in a 247 environment.
Key Responsibilities
- Operate and harden Kubernetes clusters: upgrade/patch node pools CNI ingress certificates autoscaling quotas RBAC and multienv promotion.
- Implement and maintain GitOps workflows using Argo CD (or Flux): app definitions health policies sync strategies drift detection rollback.
- Standardize platform addons via Helm/Kustomize (ingress cert manager secrets log/metrics/traces agents).
- Build reusable Terraform modules (networking cluster storage identity observability) and enforce plan/apply and codereview workflows.
- Create Python/Shell automation for cluster operations validations drift remediation image promotion capacity and cost hygiene.
- Develop and tune Grafanastyle dashboards and alerts; reduce noise improve MTTR and document RCAs.
- Apply leastprivilege secrets hygiene image provenance and policy controls; execute maintenance windows patching and upgrades.
- Keep runbooks/diagrams/SOPs current; contribute to knowledge base and mentor junior engineers.
- Collaborate with internal/external stakeholders during deployments cutovers and incidents; communicate tradeoffs and status clearly.
Qualifications :
Required Qualifications
- 35 years in Cloud/SRE/Platform Engineering supporting production systems.
- Handson Kubernetes operations (cluster lifecycle ingress certs autoscaling RBAC Helm/Kustomize).
- Experience with GitOps (Argo CD or Flux) and declarative release practices.
- Strong Terraform skills including authoring and maintaining modules across environments.
- Monitoring experience with Grafanastyle dashboards and alerting; ability to define meaningful SLO/SLA signals.
- Proficient in Python and Shell; comfortable with Git and code reviews.
- Solid understanding of networking security containers and Linux fundamentals.
- Experience in followthesun/247 support with oncall participation.
- Excellent written and verbal communication for global and customerfacing work.
Preferred (Good to Have)
- Understanding of UI design and common UI tools for simple internal portals or operational views.
- Experience working with databases (performance basics HA/failover backups/restores).
- Fluency using AI tools as a companion for research code review documentation or incident triage.
Shift & OnCall Expectations
- Assigned shift aligned with global operations; occasional adjustments for maintenance/projects.
- Participation in rotational oncall for P1/P2 events per local policy; precise handoffs and status updates.
Education
- College degree in Computer Science Information Technology or a related field preferred
- Demonstrated relevant experience may be substituted for a degree
- Kubernetes certification (e.g. CKA/CKAD/CKS) a plus
Additional Information :
We champion flexibility and hybrid work options to support varying lifestyles and personal needs. At the same time we value the power of in-person collaboration to build community spark innovation and strengthen connections. Our approach ensures you can work in ways that suit you best while still engaging with colleagues to share ideas and grow together. #LI-Hybrid #LI-DNP
Remote Work :
No
Employment Type :
Full-time
The Cloud Engineer will build operate and improve Kubernetesbased platforms supporting Rhapsodys cloud services. You will own cluster reliability GitOpsdriven deployments (Argo CD or similar) infrastructureascode with Terraform modules and production monitoring using Grafanastyle dashboards. Youll c...
The Cloud Engineer will build operate and improve Kubernetesbased platforms supporting Rhapsodys cloud services. You will own cluster reliability GitOpsdriven deployments (Argo CD or similar) infrastructureascode with Terraform modules and production monitoring using Grafanastyle dashboards. Youll collaborate with SRE Security and Engineering to deliver resilient observable and costaware services in a 247 environment.
Key Responsibilities
- Operate and harden Kubernetes clusters: upgrade/patch node pools CNI ingress certificates autoscaling quotas RBAC and multienv promotion.
- Implement and maintain GitOps workflows using Argo CD (or Flux): app definitions health policies sync strategies drift detection rollback.
- Standardize platform addons via Helm/Kustomize (ingress cert manager secrets log/metrics/traces agents).
- Build reusable Terraform modules (networking cluster storage identity observability) and enforce plan/apply and codereview workflows.
- Create Python/Shell automation for cluster operations validations drift remediation image promotion capacity and cost hygiene.
- Develop and tune Grafanastyle dashboards and alerts; reduce noise improve MTTR and document RCAs.
- Apply leastprivilege secrets hygiene image provenance and policy controls; execute maintenance windows patching and upgrades.
- Keep runbooks/diagrams/SOPs current; contribute to knowledge base and mentor junior engineers.
- Collaborate with internal/external stakeholders during deployments cutovers and incidents; communicate tradeoffs and status clearly.
Qualifications :
Required Qualifications
- 35 years in Cloud/SRE/Platform Engineering supporting production systems.
- Handson Kubernetes operations (cluster lifecycle ingress certs autoscaling RBAC Helm/Kustomize).
- Experience with GitOps (Argo CD or Flux) and declarative release practices.
- Strong Terraform skills including authoring and maintaining modules across environments.
- Monitoring experience with Grafanastyle dashboards and alerting; ability to define meaningful SLO/SLA signals.
- Proficient in Python and Shell; comfortable with Git and code reviews.
- Solid understanding of networking security containers and Linux fundamentals.
- Experience in followthesun/247 support with oncall participation.
- Excellent written and verbal communication for global and customerfacing work.
Preferred (Good to Have)
- Understanding of UI design and common UI tools for simple internal portals or operational views.
- Experience working with databases (performance basics HA/failover backups/restores).
- Fluency using AI tools as a companion for research code review documentation or incident triage.
Shift & OnCall Expectations
- Assigned shift aligned with global operations; occasional adjustments for maintenance/projects.
- Participation in rotational oncall for P1/P2 events per local policy; precise handoffs and status updates.
Education
- College degree in Computer Science Information Technology or a related field preferred
- Demonstrated relevant experience may be substituted for a degree
- Kubernetes certification (e.g. CKA/CKAD/CKS) a plus
Additional Information :
We champion flexibility and hybrid work options to support varying lifestyles and personal needs. At the same time we value the power of in-person collaboration to build community spark innovation and strengthen connections. Our approach ensures you can work in ways that suit you best while still engaging with colleagues to share ideas and grow together. #LI-Hybrid #LI-DNP
Remote Work :
No
Employment Type :
Full-time
View more
View less