Site Reliability Engineer — Human Engineering

Apple

Job Location:

Austin, TX - USA

Monthly Salary: Not Disclosed

Posted on: 14 hours ago

Vacancies: 1 Vacancy

Job Summary

At Apple new ideas have a way of becoming phenomenal products services and customer experiences very quickly. Imagine what you could do here. Bring passion and dedication to your job and theres no telling what you could are a team of software engineers developing web-based tools and native applications for Apple teams. Our work empowers Apple engineers and researchers to build the products that inspire and delight millions every day. nnWere looking for a Site Reliability Engineer who thinks like a systems engineer first and an operator second. You wont just keep things running youll shape how our platform evolves. Our team operates 50 services across Kubernetes and AWS handles sensitive health and research data and is ramping up many architectural shifts: new service-to-service auth patterns event-driven pipelines and a move from on-prem to cloud-native infrastructure. We need someone who gets excited about that kind of work can reason about distributed systems at the design level and is a strong enough communicator to bring the rest of the team along.n

The Human Engineering Software team builds tools used across Apple for user studies research participant management health data collection and privacy-preserving analytics. Our infrastructure spans Django backends Kubernetes clusters (self-hosted and AWS) PostgreSQL Redis Kafka Elasticsearch and a growing set of internal service role is engineering-forward SRE. Youll spend as much time designing systems as operating them. Youll work closely with our full-stack engineers to improve how services communicate how we observe production behavior and how we ship changes safely. Youll have a seat at the architecture table we want you proposing solutions not just implementing them.n

Platform u0026 Reliability Engineering - Own the reliability of our Kubernetes-hosted services across AWS and self-hosted clusters: deployments scaling capacity planning certificate management and secrets rotation. Design and implement SLO-driven observability: define meaningful SLIs build dashboards that answer is the system healthy not just is the pod running Drive incident response and blameless postmortemsnnDistributed Systems u0026 Architecture - Partner with the architecture team on system design: service-to-service authentication (OIDC gateway auth) event-driven messaging (Kafka) API gateway patterns. Design the infrastructure layer to make architecture proposals real in production. Evaluate and recommend new tools patterns and platforms and write code when its the right tool whether thats a deployment operator a health check service or a data pipeline component. This isnt a YAML-only rolennEngineering Enablement - Make the team efficient; own CI/CD pipelines and GitOps practices owning tests to verify or production tools are functioning correctly build self-service automation evolve our observability and security posture and communicate infrastructure decisions clearly across technical and non-technical stakeholdersn

BS in Computer Science Engineering or equivalent practical experience with 3 years of experience in distributed systemsnDeep experience with Kubernetes in production cluster operations networking storage troubleshootingnStrong proficiency designing and operating services in AWS (EC2 EKS RDS S3 IAM VPC)nHands-on infrastructure-as-code experience (Terraform Helm or equivalent)nProficiency in at least one backend language (Python Go or similar) you can write production services not just scriptsnExperience with CI/CD pipeline design and GitOps workflowsnStrong understanding of networking fundamentals: DNS load balancing TLS firewall rules service discoverynExcellent communication skills. You can explain a complex system to a room of engineers who didnt build itnExperience building internal automation or self-service tooling (Slack bots CLI tools workflow orchestration) that reduced manual operational work

BS in Computer Science Engineering or equivalent practical experience with 5 years of experience in distributed systemsnExperience with event-driven architectures (Kafka RabbitMQ or similar messaging systems)nExperience with service mesh or API gateway patterns (Istio Envoy Kong or similar)nFamiliarity with Django/Python web applications and their operational characteristics (Celery Gunicorn PostgreSQL)nExperience with observability tooling beyond basic monitoring: distributed tracing SLO frameworks structured loggingnBackground working with sensitive data (health data PII) and associated compliance requirementsnExperience leading incident response and building on-call culturenContributions to internal or open-source infrastructure tooling

Required Experience:

Apply Now

About Company

Apple

Ask Siri to name the most successful company in the world and it might respond: Apple. And it's not just out of familial pride. Apple consistently ranks highly in profit, revenue, market capitalization, and consumer cachet. In 2018, the company became the first reach a trillion dollar ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click