Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via email## Key Responsibilities
### Reliability and Performance Management
Design implement and maintain highly available scalable and resilient cloudnative architectures for missioncritical SaaS products.
Develop and implement SLOs SLIs and SLAs to measure and improve service reliability.
Continuously optimize system performance and resource utilization across multiple cloud platforms.
Finetune/Optimize Application performance by analyzing the code traces and database queries.
### Incident Management and Troubleshooting
Lead incident response efforts effectively troubleshooting complex issues to minimize downtime and impact.
Reduce Mean Time to Recover (MTTR) through proactive monitoring automated alerting and efficient problemsolving techniques.
Conduct thorough Root Cause Analysis (RCA) for all major incidents and implement preventive measures.
### Observability and Monitoring
Design and implement endtoend observability solutions across our distributed systems.
Develop and maintain comprehensive monitoring strategies using tools like ELK Stack Prometheus Grafana.
Create and optimize product status dashboards to provide realtime visibility into system health and performance.
### Automation and Infrastructure as Code (IaC)
Implement Infrastructure as Code practices using tools like Terraform.
Develop and maintain automated deployment pipelines and CI/CD workflows.
Create selfhealing systems and automate routine operational tasks to reduce manual intervention.
### CloudAgnostic Architecture
Design and implement cloudagnostic solutions that can operate efficiently across multiple cloud providers.
Develop expertise in eventdriven architectures and related technologies (e.g. Apache Kafka/Eventhub Redis Mongo Atlas IoTHub).
Implement and manage containerized applications using Kubernetes across different cloud environments.
### Continuous Improvement
Regularly review and refine operational practices to enhance efficiency and reliability.
Stay updated with the latest industry trends and technologies in SRE cloud computing and DevOps.
Contribute to the development of internal tools and frameworks to support SRE practices.
## Requirements
Strong knowledge of cloud platforms Azure and their associated services.
Expert in Observability tools (ELK Stack Dynatrace Prometheus
Expertise in containerization technologies such as Docker and Kubernetes
Understanding of Eventdriven architecture and database technologies (Mongo Atlas Azure SQL PostgresDB
Proficient in IaaC tools such as Terraform and GitHub Actions.
Proficiency in one or more programming languages Python/.Net/Java
Strong understanding of networking concepts load balancing and security practices.
HARMAN is proud to be an Equal Opportunity / Affirmative Action employer. All qualified applicants will receive consideration for employment without regard torace religion color national origin gender (including pregnancy childbirth or related medical conditions) sexual orientation gender identity gender expression age status as a protected veteran status as an individual with a disability or other applicable legally protected characteristics.
Required Experience:
Senior IC
Full-Time