Lead Engineer Support Linux Engineer
Job Summary
At Graphcore were building the future of AI compute.
Were a team of semiconductor software and AI experts with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter scale.
As part of the SoftBank Group backed by significant long-term investment we are delivering key technology into the fast-growing SoftBank AI meet the vast and exciting AI opportunity Graphcore is expanding its teams around the are bringing together the brightest minds to solve the toughest problems in a place where everyone has the opportunity to make an impact on the company our products and the future of artificial intelligence.
Job Summary
We are looking for a highly experienced Lead Engineer Support Linux Engineer to guide and develop a small group supporting engineering systems in a fast-paced AI-centered environment.
The position requires strong Linux skills combined with leadership automation and DevOps approaches to maintain systems that are reliable scalable and easy to support at scale. An important responsibility is developing and managing a configuration-as-code this setup system configuration and operations are handled through automation pipelines and source control rather than manual intervention.
You will be responsible for leading incident response driving operational improvements and setting standards for how Linux systems are managed and supported across the organization.
While the role includes leadership responsibilities it will initially require a hands-on approach including direct involvement in troubleshooting system support and automation efforts while building team capability and scaling processes.
Collaborating intimately with engineering groups platform engineers and infrastructure experts you will guarantee systems stay stable efficient and consistent with changing business and product delivery requirements.
The Team
Youll be joining a multi-disciplinary team with strong technical skills and a very supportive culture. We work closely together regularly share knowledge and your skills will make a direct impact on our business. Its an exciting and pivotal moment for us right now with plenty of new projects ahead. If youre looking to solve interesting problems and see your work deliver real-world results this is the team for you!
Responsibilities and Duties
- Guide mentor and cultivate a team of Linux Engineering Support Engineers defining clear roles responsibilities and methods of collaboration
- Own and oversee support for Linux-based systems and engineering environments ensuring stability performance and availability
- Act as a point of contact for complex technical issues and outages providing hands-on support when a customer concern arises
- Diagnose and resolve high-impact system and interoperability issues across mixed and distributed environments
- Perform hands-on investigation and troubleshooting to understand issues and drive effective solutions
- Direct incident response efforts encompassing triage coordination and resolution
- Take responsibility for and lead Root Cause Analysis (RCA) processes ensuring preventative improvements are identified and applied
- Establish and improve incident management processes driving operational maturity and reliability
- Drive adoption of automation and configuration-as-code practices across Linux systems
- Ensure system changes are delivered through controlled auditable processes wherever possible
- Oversee development and implementation of automation solutions for system management and operational tasks
- Promote and support use of workflows based on Git and CI/CD pipelines for configuration and operational processes
- Identify and prioritize opportunities to reduce manual effort through automation and improved tooling
- Collaborate with engineering teams to assist development environments and system requirements
- Act as a senior technical liaison between engineering teams and infrastructure/platform functions
- Support onboarding of new systems services and environments using standardized and automated approaches
- Ensure system configurations stay consistent and aligned with established standards and governance
- Oversee integration points (e.g. identity CI/CD tooling) and ensure issues are resolved effectively
- Identify and drive improvements in system performance scalability and maintainability
- Contribute to and enforce documentation standards and operational guidelines
- Ensure systems meet audit compliance and governance requirements with full traceability of changes
Candidate Profile
Essential
- Extensive experience managing and maintaining Linux-based systems in complex technical or engineering environments
- Strong troubleshooting skills across operating systems networking storage and application layers
- Demonstrated ability to identify and solve intricate technical problems including within diverse or distributed settings
- Demonstrated experience managing significant incidents and outages including directing resolution efforts and participating in Root Cause Analysis (RCA)
- Extensive background in automation and scripting (e.g. Bash Python or similar)
- Extensive background in configuration management or infrastructure-as-code tools (e.g. Ansible Terraform Puppet or similar)
- Experience working with configuration-as-code practices and workflows managed through Git
- Experience building managing or assisting with CI/CD pipelines for configuration and operational processes
- Strong understanding of system interoperability across distributed environments
- Experience working within defined standards governance frameworks and controlled processes
- Strong communication skills and ability to collaborate closely with engineering platform and infrastructure teams
- Experience mentoring or supporting the development of other engineers
- Capability to work efficiently across different time zones within a dispersed organization
- Demonstrated capability to work autonomously establish goals and achieve results
Desirable
- Experience managing or coordinating incident response activities
- Experience working alongside DevOps platform or infrastructure engineering teams
- Experience with monitoring observability and logging systems
- Experience supporting AI/ML or high-performance computing environments
- Understanding of identity and access management concepts
- Experience building or scaling operational processes or support functions
- Experience managing and maintaining Linux-based systems in a technical or engineering environment
We welcome people of different backgrounds and experiences; were committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments.
Required Experience:
IC