We are looking for a Lead Middle or Senior DevOps Engineer to join a research infrastructure team building an on-demand GPU platform for advanced compute workflows. The role focuses on enabling secure scalable and user-friendly access to high-performance GPU resources through automation scheduling and modern platform tooling.
Locations: Serbia Georgia Armenia Kazakhstan Poland Croatia Portugal Egypt.
Tasks
- Strong hands-on experience with Kubernetes and platform orchestration;
- Solid understanding of scheduling reservation or namespace-based resource management systems;
- Experience with GPU infrastructure virtualization slicing or containerized workstation environments;
- Strong scripting and automation skills;
- Practical Azure experience and familiarity with secure infrastructure operations.
Requirements
Responsibilities
- Build and improve an on-demand GPU workstation platform with lightweight containerization or virtualization;
- Implement scheduling reservation registration image management storage mounting SSH with SSO and developer-friendly access flows;
- Automate cluster namespace configuration across CPU GPU memory and storage allocations;
- Support hierarchical capacity allocation models with RBAC-based administration;
- Automate storage import export and archival workflows as allocations change;
- Build monitoring alerts and automated incident ticket creation for large-scale cluster environments;
- Improve integrations between source control CI/CD package distribution and GPU-connected development workflows;
- Contribute automation scripts and agentic tooling that improve infrastructure and day-to-day research workflows.
Nice to Have:
- Experience with Prometheus Grafana incident automation or on-call paging workflows;
- Experience with developer platforms devcontainers or remote development tooling such as VS Code integrations;
- Exposure to AI-assisted monitoring trend analysis or agentic infrastructure tooling.
Engagement Type
Location / Timezone
- Remote work from Serbia Georgia Armenia Kazakhstan Poland Croatia Portugal Egypt.
- European working hours.
- Occasionally available for meetings up to 10:00 AM PST (US overlap).
We are looking for a Lead Middle or Senior DevOps Engineer to join a research infrastructure team building an on-demand GPU platform for advanced compute workflows. The role focuses on enabling secure scalable and user-friendly access to high-performance GPU resources through automation scheduling a...
We are looking for a Lead Middle or Senior DevOps Engineer to join a research infrastructure team building an on-demand GPU platform for advanced compute workflows. The role focuses on enabling secure scalable and user-friendly access to high-performance GPU resources through automation scheduling and modern platform tooling.
Locations: Serbia Georgia Armenia Kazakhstan Poland Croatia Portugal Egypt.
Tasks
- Strong hands-on experience with Kubernetes and platform orchestration;
- Solid understanding of scheduling reservation or namespace-based resource management systems;
- Experience with GPU infrastructure virtualization slicing or containerized workstation environments;
- Strong scripting and automation skills;
- Practical Azure experience and familiarity with secure infrastructure operations.
Requirements
Responsibilities
- Build and improve an on-demand GPU workstation platform with lightweight containerization or virtualization;
- Implement scheduling reservation registration image management storage mounting SSH with SSO and developer-friendly access flows;
- Automate cluster namespace configuration across CPU GPU memory and storage allocations;
- Support hierarchical capacity allocation models with RBAC-based administration;
- Automate storage import export and archival workflows as allocations change;
- Build monitoring alerts and automated incident ticket creation for large-scale cluster environments;
- Improve integrations between source control CI/CD package distribution and GPU-connected development workflows;
- Contribute automation scripts and agentic tooling that improve infrastructure and day-to-day research workflows.
Nice to Have:
- Experience with Prometheus Grafana incident automation or on-call paging workflows;
- Experience with developer platforms devcontainers or remote development tooling such as VS Code integrations;
- Exposure to AI-assisted monitoring trend analysis or agentic infrastructure tooling.
Engagement Type
Location / Timezone
- Remote work from Serbia Georgia Armenia Kazakhstan Poland Croatia Portugal Egypt.
- European working hours.
- Occasionally available for meetings up to 10:00 AM PST (US overlap).
View more
View less