Job Title: Software Engineer Batch Compute
Duration:12 Months
Location: Dallas TX- Hybrid
Key responsibilities of the role include:
- Designing and developing high-quality software solutions using procedural programming languages with a focus on Golang
- Building and maintaining highly scalable highly available and globally distributed systems to support large-scale research workloads
- Managing and optimising data interactions across relational and non-relational databases particularly PostgreSQL
- Developing and operating containerised applications within Kubernetes ensuring effective orchestration and workload scheduling
- Supporting tuning and troubleshooting Linux-based systems as part of our core compute platform
- Applying core networking knowledge to help debug optimise and enhance platform connectivity and performance
- Independently diagnosing and resolving complex technical issues across infrastructure and software layers
- Applying solid software architecture principles computer science fundamentals and data structure knowledge to guide design decisions and code quality
- Driving continuous improvement by contributing to CI/CD pipelines and engineering best practices
- Staying up to date with emerging technologies and approaches and applying new knowledge across disciplines
Who are we looking for
The ideal candidate will have the following skills and experience:
- Experience with developing Kubernetes components such as controllers and operators
- Experience with event-driven programming and message queues such as apache Kafka and Pulsar
- Experience of high-performance computing Kubernetes or DAG (Directed Acyclic Graph) workflows
- Experience of operating or using job scheduling systems e.g. SLURM
- Use of operational and runtime tools and practices including monitoring and logging with systems such as Prometheus and Grafana