As a Compute Site Reliability Engineer you will be responsible for maintaining monitoring and improving the reliability scalability and performance of our Kubernetes-based infrastructure. Youll work closely with senior SREs developers and other engineers to ensure high availability and optimize our containerized applications. This is a fantastic opportunity for someone eager to grow their expertise in Kubernetes and cloud-native an SRE at Apple you will:* Operate monitor and triage all aspects of our production and non-production environments.* Design build and implement innovative solutions for previous present and future issues. * Prepare alert handling procedures runbooks and collaborate with other SRE teams.* Participate in on-call rotations to troubleshoot and resolve production issues minimizing downtime.* Automate deployment and orchestration of services into the cloud environment as well as other routine processes.* Actively participate in capacity planning scale testing and disaster recovery exercises.
- Bachelors Degree in Computer Science an engineering-related field or equivalent related experience.
- Basic understanding of Kubernetes architecture including Pods Deployments Services and ConfigMaps.
- Familiarity with Linux systems administration and command-line tools.
- Experience with scripting languages like Bash Python or Go.
- Knowledge of monitoring tools such as Prometheus Grafana or similar.
- Exposure to CI/CD pipelines and DevOps practices.
- Awareness of containerization.
- Strong problem-solving skills and a willingness to learn new technologies.
- Outstanding organizational and communications skills
- Strong verbal and written communication skills
- Automation advocate - you truly believe in removing operational load via software.
- Familiarity with Infrastructure as Code (IaC) tools like Puppet
- A strong sense of ownership. At the same time youre a great teammate who communicates clearly and transparently - Self-motivated inquisitive and always looking to learn more.
- Experience managing scaling and troubleshooting Java and Go applications
- CNCF Kubernetes Administration certification
As a Compute Site Reliability Engineer you will be responsible for maintaining monitoring and improving the reliability scalability and performance of our Kubernetes-based infrastructure. Youll work closely with senior SREs developers and other engineers to ensure high availability and optimize our ...
As a Compute Site Reliability Engineer you will be responsible for maintaining monitoring and improving the reliability scalability and performance of our Kubernetes-based infrastructure. Youll work closely with senior SREs developers and other engineers to ensure high availability and optimize our containerized applications. This is a fantastic opportunity for someone eager to grow their expertise in Kubernetes and cloud-native an SRE at Apple you will:* Operate monitor and triage all aspects of our production and non-production environments.* Design build and implement innovative solutions for previous present and future issues. * Prepare alert handling procedures runbooks and collaborate with other SRE teams.* Participate in on-call rotations to troubleshoot and resolve production issues minimizing downtime.* Automate deployment and orchestration of services into the cloud environment as well as other routine processes.* Actively participate in capacity planning scale testing and disaster recovery exercises.
- Bachelors Degree in Computer Science an engineering-related field or equivalent related experience.
- Basic understanding of Kubernetes architecture including Pods Deployments Services and ConfigMaps.
- Familiarity with Linux systems administration and command-line tools.
- Experience with scripting languages like Bash Python or Go.
- Knowledge of monitoring tools such as Prometheus Grafana or similar.
- Exposure to CI/CD pipelines and DevOps practices.
- Awareness of containerization.
- Strong problem-solving skills and a willingness to learn new technologies.
- Outstanding organizational and communications skills
- Strong verbal and written communication skills
- Automation advocate - you truly believe in removing operational load via software.
- Familiarity with Infrastructure as Code (IaC) tools like Puppet
- A strong sense of ownership. At the same time youre a great teammate who communicates clearly and transparently - Self-motivated inquisitive and always looking to learn more.
- Experience managing scaling and troubleshooting Java and Go applications
- CNCF Kubernetes Administration certification
View more
View less