We are seeking a hands-on Platform Engineer to support a high performance computing platform used by computational scientists in Research & Development. This role focuses on AWS infrastructure DevOps automation container platforms and high throughput storage with heavy use of infrastructure as code. You will own cloud and HPC infrastructure end-to-end and work closely with scientists and engineers to deliver scalable reliable and automated platform solutions.
Key Responsibilities:
- Design build and operate scalable and high performance cloud infrastructure on AWS
- Manage infrastructure as code using Terraform Terragrunt and CloudFormation
- Build immutable infrastructure with Packer
- Develop and maintain CI/CD pipelines using GitLab CI/CD
- Operate containerized workloads across:
- Amazon EKS
- Docker on EC2
- Singularity (Apptainer) for HPC workloads
- Configure systems using Ansible
- Design and operate high throughput cloud and HPC storage solutions
- Monitor troubleshoot and optimize platforms for performance reliability and cost
- Document architectures and operational best practices
Qualifications :
- Strong hands-on experience with AWS
- Deep experience with Terraform / Terragrunt; working knowledge of CloudFormation
- Experience with GitLab CI/CD
- Containers: Kubernetes (EKS) Docker familiarity with Singularity
- High performance storage experience (e.g. FSx for Lustre Weka or similar)
- Image Builds: Experience with Packer for AMI and image creation
- Strong Linux and networking experience with working knowledge of Ansible for configuration management
- Python/Bash scripting plus strong communication and documentation skills
- Proficiency with Git
Nice to Have / Preferred Skills:
- Experience with HPC scientific computing or data-intensive platforms
- Familiarity with Go or other scripting languages for automation
- Cloud security best practices
- Observability tools (Prometheus Grafana CloudWatch)
Qualifications:
- 5 years in DevOps Platform Engineering or SRE
- Bachelors degree or equivalent practical experience
- Strong problem-solving and communication skills
- Comfortable working independently in a fast moving collaborative environment
Additional Information :
***This role is 100% remote.
Remote Work :
Yes
Employment Type :
Full-time
We are seeking a hands-on Platform Engineer to support a high performance computing platform used by computational scientists in Research & Development. This role focuses on AWS infrastructure DevOps automation container platforms and high throughput storage with heavy use of infrastructure as code....
We are seeking a hands-on Platform Engineer to support a high performance computing platform used by computational scientists in Research & Development. This role focuses on AWS infrastructure DevOps automation container platforms and high throughput storage with heavy use of infrastructure as code. You will own cloud and HPC infrastructure end-to-end and work closely with scientists and engineers to deliver scalable reliable and automated platform solutions.
Key Responsibilities:
- Design build and operate scalable and high performance cloud infrastructure on AWS
- Manage infrastructure as code using Terraform Terragrunt and CloudFormation
- Build immutable infrastructure with Packer
- Develop and maintain CI/CD pipelines using GitLab CI/CD
- Operate containerized workloads across:
- Amazon EKS
- Docker on EC2
- Singularity (Apptainer) for HPC workloads
- Configure systems using Ansible
- Design and operate high throughput cloud and HPC storage solutions
- Monitor troubleshoot and optimize platforms for performance reliability and cost
- Document architectures and operational best practices
Qualifications :
- Strong hands-on experience with AWS
- Deep experience with Terraform / Terragrunt; working knowledge of CloudFormation
- Experience with GitLab CI/CD
- Containers: Kubernetes (EKS) Docker familiarity with Singularity
- High performance storage experience (e.g. FSx for Lustre Weka or similar)
- Image Builds: Experience with Packer for AMI and image creation
- Strong Linux and networking experience with working knowledge of Ansible for configuration management
- Python/Bash scripting plus strong communication and documentation skills
- Proficiency with Git
Nice to Have / Preferred Skills:
- Experience with HPC scientific computing or data-intensive platforms
- Familiarity with Go or other scripting languages for automation
- Cloud security best practices
- Observability tools (Prometheus Grafana CloudWatch)
Qualifications:
- 5 years in DevOps Platform Engineering or SRE
- Bachelors degree or equivalent practical experience
- Strong problem-solving and communication skills
- Comfortable working independently in a fast moving collaborative environment
Additional Information :
***This role is 100% remote.
Remote Work :
Yes
Employment Type :
Full-time
View more
View less