Lead Site Reliability Engineer (GTAM)
Job Summary
Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.
If you are excited about shaping the future of technology and driving significant business impact in financial services we are looking for people just like you. Join our team and help us develop game-changing high-quality solutions.
As a Lead Site Reliability Engineer at JPMorganChase within the Corporate sector Enterprise Technology team you are an integral part of a team that develops high-quality architecture solutions for critical software applications and platforms. You will lead resiliency design reviews break down complex problems and mentor engineers driving significant business impact and shaping the target state architecture through your expertise in multiple architecture domains.
Job responsibilities
- Demonstrate and champion site reliability culture and practices exerting technical influence across your team
- Lead initiatives to improve reliability and stability of applications and platforms using data-driven analytics
- Collaborate with team members to define service level indicators and work with stakeholders to establish service level objectives and error budgets
- Provide technical leadership and guidance for medium to large-sized products
- Proactively identify and resolve technology-related bottlenecks in your areas of expertise
- Act as the main point of contact during major incidents quickly identifying and solving issues to avoid financial losses
- Document and share knowledge within the organization through internal forums and communities of practice
Required qualifications capabilities and skills
- Formal training or certification on software engineering concepts and 5 years applied experience
- At least 5 years as an SRE and at least 10 years in a highly regulated industry such as Banking
- Deep proficiency in reliability scalability performance security enterprise system architecture toil reduction and site reliability best practices with the ability to implement these practices within an application or platform
- Demonstrated experience designing deploying and supporting highly available services in a public cloud environment (AWS Azure or GCP); familiarity with cloud-native observability auto-scaling and infrastructure-as-code is essential
- Fluency in at least one programming language (e.g. Python Java Spring )
- Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines
- Proficiency and experience in observability including white and black box monitoring SLO alerting and telemetry collection using tools such as Grafana Dynatrace Prometheus Datadog Splunk
- Proficiency in continuous integration and continuous delivery tools (e.g. Jenkins GitLab Terraform)
- Experience with containers and container orchestration (e.g. ECS Kubernetes Docker)
- Experience troubleshooting common networking technologies and issues
Preferred qualifications capabilities and skills
- Ability to identify and solve problems related to complex data structures and algorithms
- Drive to self-educate and evaluate new technology
- Ability to teach new programming languages to team members
- Ability to expand and collaborate across different levels and stakeholder groups
Required Experience:
IC
About Company
JPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world’s most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans ov ... View more