Senior Software Engineer Distributed Systems & Kubernetes Scheduling
Location: Dallas TX (Hybrid)
Competitive base salary performance bonus
100% company-paid benefits
Overview
We are seeking a Senior Software Engineer to support the development of a large-scale distributed compute platform designed to run complex high-throughput workloads.
This role focuses on building and optimizing systems that manage workload orchestration across containerized environments with an emphasis on scheduling scalability and performance. You will work on solving challenges related to distributed job execution resource allocation and multi-cluster coordination.
The ideal candidate brings a strong foundation in software engineering experience working with Kubernetes-based systems and a passion for building reliable high-scale infrastructure.
Key Responsibilities
Distributed Systems & Platform Development
Design and develop backend services that support large-scale workload orchestration and scheduling
Build systems capable of operating across multiple clusters and environments with high availability and resiliency
Contribute to platform architecture decisions with a focus on scalability and performance
Kubernetes & Workload Orchestration
Develop and maintain Kubernetes-based services including custom controllers and operators
Optimize workload placement and execution across distributed compute environments
Support containerized applications and ensure efficient orchestration of batch and long-running workloads
Data & System Performance
Design and optimize interactions with both relational and non-relational data systems including PostgreSQL
Analyze and improve system performance across compute storage and networking layers
Infrastructure & Reliability
Support and troubleshoot Linux-based systems that underpin the compute platform
Apply networking fundamentals to diagnose and resolve performance or connectivity issues
Identify and resolve complex issues across both software and infrastructure layers
Engineering Excellence
Apply strong computer science fundamentals data structures and system design principles
Contribute to CI/CD pipelines and promote engineering best practices across the team
Continuously evaluate and adopt new tools frameworks and approaches to improve platform capabilities
Required Experience
Strong software engineering experience with a focus on backend or systems development (Go preferred)
Experience building or extending Kubernetes components such as controllers or operators
Familiarity with event-driven architectures and messaging systems (e.g. Kafka Pulsar or similar)
Experience working with large-scale distributed systems or compute platforms
Experience operating in cloud environments preferably AWS
Familiarity with monitoring logging and observability tools (e.g. Prometheus Grafana)
Experience with workload orchestration or scheduling systems (e.g. SLURM or similar frameworks)
Preferred Experience
Exposure to high-throughput or batch processing systems
Experience supporting data-intensive or compute-heavy workloads
Familiarity with DAG-based workflows or pipeline orchestration concepts
Experience working in environments supporting AI/ML or large-scale data processing
Senior Software Engineer Distributed Systems & Kubernetes Scheduling Location: Dallas TX (Hybrid) Competitive base salary performance bonus 100% company-paid benefitsOverviewWe are seeking a Senior Software Engineer to support the development of a large-scale distributed compute platform designed...
Senior Software Engineer Distributed Systems & Kubernetes Scheduling
Location: Dallas TX (Hybrid)
Competitive base salary performance bonus
100% company-paid benefits
Overview
We are seeking a Senior Software Engineer to support the development of a large-scale distributed compute platform designed to run complex high-throughput workloads.
This role focuses on building and optimizing systems that manage workload orchestration across containerized environments with an emphasis on scheduling scalability and performance. You will work on solving challenges related to distributed job execution resource allocation and multi-cluster coordination.
The ideal candidate brings a strong foundation in software engineering experience working with Kubernetes-based systems and a passion for building reliable high-scale infrastructure.
Key Responsibilities
Distributed Systems & Platform Development
Design and develop backend services that support large-scale workload orchestration and scheduling
Build systems capable of operating across multiple clusters and environments with high availability and resiliency
Contribute to platform architecture decisions with a focus on scalability and performance
Kubernetes & Workload Orchestration
Develop and maintain Kubernetes-based services including custom controllers and operators
Optimize workload placement and execution across distributed compute environments
Support containerized applications and ensure efficient orchestration of batch and long-running workloads
Data & System Performance
Design and optimize interactions with both relational and non-relational data systems including PostgreSQL
Analyze and improve system performance across compute storage and networking layers
Infrastructure & Reliability
Support and troubleshoot Linux-based systems that underpin the compute platform
Apply networking fundamentals to diagnose and resolve performance or connectivity issues
Identify and resolve complex issues across both software and infrastructure layers
Engineering Excellence
Apply strong computer science fundamentals data structures and system design principles
Contribute to CI/CD pipelines and promote engineering best practices across the team
Continuously evaluate and adopt new tools frameworks and approaches to improve platform capabilities
Required Experience
Strong software engineering experience with a focus on backend or systems development (Go preferred)
Experience building or extending Kubernetes components such as controllers or operators
Familiarity with event-driven architectures and messaging systems (e.g. Kafka Pulsar or similar)
Experience working with large-scale distributed systems or compute platforms
Experience operating in cloud environments preferably AWS
Familiarity with monitoring logging and observability tools (e.g. Prometheus Grafana)
Experience with workload orchestration or scheduling systems (e.g. SLURM or similar frameworks)
Preferred Experience
Exposure to high-throughput or batch processing systems
Experience supporting data-intensive or compute-heavy workloads
Familiarity with DAG-based workflows or pipeline orchestration concepts
Experience working in environments supporting AI/ML or large-scale data processing
View more
View less