About You
This is a highimpact highexpectation senior DevOps/SRE role supporting a Private Equity Group (PEG) platform. The team is new the infrastructure is new and the customer is extremely selective and private.
You must operate with strong autonomy influence engineering decisions challenge assumptions and serve as a thought partner rather than an ordertaker. You will work directly with VPlevel stakeholders in a technical environment where reliability deployment safety and clarity in communication are essential.
This position requires someone who can own DevOps architecture endtoend enable safe and frequent deployments and establish operational excellence from day zero.
Note: This position is offered under a contractor model for a period of 6 months.
You Bring to Applaudo the Following Competencies
- Proven ownership of production-grade CI/CD pipelines using GitHub Actions reusable workflows and GitOps automation with ArgoCD.
- Expert-level Kubernetes and EKS operations including node group management Karpenter autoscaling RBAC PDBs and topology constraints.
- Production-scale Terraform expertise including module design S3 DynamoDB remote state and PR-driven workflows via Atlantis.
- Strong reliability engineering experience including SLO/SLI design alerting strategies dashboards incident response and post-incident reviews.
- Hands-on experience operating HashiCorp Vault including auth backends PKI dynamic secrets and audit logging.
- Experience implementing supply-chain security controls including image scanning and signing SBOM generation and policy enforcement with OPA/Gatekeeper.
- Strong experience with observability stacks including Prometheus Grafana Loki Tempo and Alertmanager.
- Experience with service mesh technologies such as Istio including traffic management mTLS AuthorizationPolicies and circuit breaking.
- Scripting ability using Python and Bash for automation and operational tooling.
- Active use of AI-assisted engineering tools such as Cursor GitHub Copilot or Cloud Code to accelerate IaC development incident response and runbook generation.
- Strong communication skills with the ability to communicate clearly and confidently with VP-level stakeholders during operational incidents.
- Advanced English proficiency as you will work directly with US-based clients.
You Will Be Accountable for the Following Responsibilities
- Design and maintain GitHub Actions reusable workflows across a multi-repository ecosystem.
- Own GitOps deployments through ArgoCD including promotion workflows sync policies drift detection and automated rollback strategies.
- Implement deployment safety mechanisms such as environment protections concurrency rules and verification gates.
- Operate and upgrade EKS clusters including Karpenter provisioning node groups and critical cluster add-ons.
- Maintain Terraform-driven infrastructure and enforce PR-driven workflows through Atlantis.
- Define and maintain SLOs SLIs alerting rules and monitoring dashboards across platform services.
- Lead incident response coordinate recovery efforts and execute structured post-incident reviews.
- Participate in an on-call rotation and contribute to improving operational processes.
- Operate and maintain HashiCorp Vault including policies authentication backends and secret engines.
- Implement supply-chain security controls including Trivy scanning Cosign signing SBOM generation and OPA/Gatekeeper enforcement.
- Partner with Security Engineering on network policies egress controls and compliance standards.
- Automate repetitive tasks and maintain proactive runbooks to reduce operational risk.
- Use AI tools to improve infrastructure automation documentation and deployment safety validation.
- Collaborate with product teams to strengthen SLOs and deployment safety practices.
- Challenge technical assumptions and advocate for scalable secure DevOps architectures.
Qualifications :
- Proven end-to-end ownership of production-grade Kubernetes/EKS environments including Karpenter and Atlantis-driven Terraform workflows.
- Demonstrated expertise with ArgoCD GitOps patterns.
- Hands-on experience with HashiCorp Vault supply-chain security controls and structured incident response including on-call rotations and post-incident reviews.
- Active use of AI-assisted tools such as Cursor GitHub Copilot or Cloud Code as part of daily engineering workflow.
Additional Information :
About Us
We Are Engineered Different.
At Applaudo talented people design build and scale meaningful AI-powered solutions that create real business impact. As an AI-native organization we collaborate across design development cloud data and artificial intelligence to turn ideas into scalable products that transform how companies operate make decisions and grow.
We are building a high-performance culture grounded in five values: Empowering Excellence Collaborative Teamwork Unsolicited Respect Consistent Transparency and Efficient Communication. These define how we work how we support one another and how we hold ourselves accountable.
Applaudo is a place for people who want to learn fast take ownership and work alongside strong teams they are proud to belong to. Joining us means being part of an organization that is evolving intentionally investing in modern ways of working and leading AI-native transformation at scale.
Remote Work :
Yes
Employment Type :
Full-time
About YouThis is a highimpact highexpectation senior DevOps/SRE role supporting a Private Equity Group (PEG) platform. The team is new the infrastructure is new and the customer is extremely selective and private.You must operate with strong autonomy influence engineering decisions challenge assumpt...
About You
This is a highimpact highexpectation senior DevOps/SRE role supporting a Private Equity Group (PEG) platform. The team is new the infrastructure is new and the customer is extremely selective and private.
You must operate with strong autonomy influence engineering decisions challenge assumptions and serve as a thought partner rather than an ordertaker. You will work directly with VPlevel stakeholders in a technical environment where reliability deployment safety and clarity in communication are essential.
This position requires someone who can own DevOps architecture endtoend enable safe and frequent deployments and establish operational excellence from day zero.
Note: This position is offered under a contractor model for a period of 6 months.
You Bring to Applaudo the Following Competencies
- Proven ownership of production-grade CI/CD pipelines using GitHub Actions reusable workflows and GitOps automation with ArgoCD.
- Expert-level Kubernetes and EKS operations including node group management Karpenter autoscaling RBAC PDBs and topology constraints.
- Production-scale Terraform expertise including module design S3 DynamoDB remote state and PR-driven workflows via Atlantis.
- Strong reliability engineering experience including SLO/SLI design alerting strategies dashboards incident response and post-incident reviews.
- Hands-on experience operating HashiCorp Vault including auth backends PKI dynamic secrets and audit logging.
- Experience implementing supply-chain security controls including image scanning and signing SBOM generation and policy enforcement with OPA/Gatekeeper.
- Strong experience with observability stacks including Prometheus Grafana Loki Tempo and Alertmanager.
- Experience with service mesh technologies such as Istio including traffic management mTLS AuthorizationPolicies and circuit breaking.
- Scripting ability using Python and Bash for automation and operational tooling.
- Active use of AI-assisted engineering tools such as Cursor GitHub Copilot or Cloud Code to accelerate IaC development incident response and runbook generation.
- Strong communication skills with the ability to communicate clearly and confidently with VP-level stakeholders during operational incidents.
- Advanced English proficiency as you will work directly with US-based clients.
You Will Be Accountable for the Following Responsibilities
- Design and maintain GitHub Actions reusable workflows across a multi-repository ecosystem.
- Own GitOps deployments through ArgoCD including promotion workflows sync policies drift detection and automated rollback strategies.
- Implement deployment safety mechanisms such as environment protections concurrency rules and verification gates.
- Operate and upgrade EKS clusters including Karpenter provisioning node groups and critical cluster add-ons.
- Maintain Terraform-driven infrastructure and enforce PR-driven workflows through Atlantis.
- Define and maintain SLOs SLIs alerting rules and monitoring dashboards across platform services.
- Lead incident response coordinate recovery efforts and execute structured post-incident reviews.
- Participate in an on-call rotation and contribute to improving operational processes.
- Operate and maintain HashiCorp Vault including policies authentication backends and secret engines.
- Implement supply-chain security controls including Trivy scanning Cosign signing SBOM generation and OPA/Gatekeeper enforcement.
- Partner with Security Engineering on network policies egress controls and compliance standards.
- Automate repetitive tasks and maintain proactive runbooks to reduce operational risk.
- Use AI tools to improve infrastructure automation documentation and deployment safety validation.
- Collaborate with product teams to strengthen SLOs and deployment safety practices.
- Challenge technical assumptions and advocate for scalable secure DevOps architectures.
Qualifications :
- Proven end-to-end ownership of production-grade Kubernetes/EKS environments including Karpenter and Atlantis-driven Terraform workflows.
- Demonstrated expertise with ArgoCD GitOps patterns.
- Hands-on experience with HashiCorp Vault supply-chain security controls and structured incident response including on-call rotations and post-incident reviews.
- Active use of AI-assisted tools such as Cursor GitHub Copilot or Cloud Code as part of daily engineering workflow.
Additional Information :
About Us
We Are Engineered Different.
At Applaudo talented people design build and scale meaningful AI-powered solutions that create real business impact. As an AI-native organization we collaborate across design development cloud data and artificial intelligence to turn ideas into scalable products that transform how companies operate make decisions and grow.
We are building a high-performance culture grounded in five values: Empowering Excellence Collaborative Teamwork Unsolicited Respect Consistent Transparency and Efficient Communication. These define how we work how we support one another and how we hold ourselves accountable.
Applaudo is a place for people who want to learn fast take ownership and work alongside strong teams they are proud to belong to. Joining us means being part of an organization that is evolving intentionally investing in modern ways of working and leading AI-native transformation at scale.
Remote Work :
Yes
Employment Type :
Full-time
View more
View less