DescriptionGuide and shape the future of technology at a globally recognized firm driven by pride in ownership.
As a Senior Manager of Site Reliability Engineering at JPMorgan Chase within the COMMERCIAL & INVESTMENT BANK Merchant and Commercial Card Production Management youare the non-functional requirement owner and champion for the applications in your remit. You are a key influencer in your teams strategic planning driving continual improvement in customer experience resiliency security scalability monitoring instrumentation and automation of the software in your area. You act in a blameless data-driven manner and navigate difficult situations with composure and tact.
Job responsibilities
- Demonstrates expertise in site reliability principles and demonstrates an understanding of the fine balance between features efficiency and stability
- Effectively negotiates with peers and executive partners to ensure optimal outcomes for all
- Drives the adoption of site reliability practices throughout the organization
- Ensures your teams demonstrate site reliability best practices with the ability to demonstrate this empirically through stability and reliability metrics
- Drives a culture of continual improvement and solicits real-time feedback to improve the customers experience
- Ensures your team collaborates with other teams within your groups specialization and avoids duplication of work where possible
- Follows blameless data-driven post-mortem strategies and conducts regular team debriefs to enable learning from both successes and mistakes
- Provides personalized coaching for entry to mid-level team members
- Ensures your team documents and shares their knowledge and innovations via internal forums communities of practice guilds and conferences
Key Responsibilities:
- Leadership and Team Management:
- Lead and mentor a global team of site reliability engineers fostering a culture of innovation collaboration and continuous improvement.
- Provide leadership training to enhance strategic thought leadership capabilities and understanding of SRE tenets.
- Operational Efficiency:
- Implement enhanced communication protocols and structured prioritization processes to streamline operations and reduce missed deliverables.
- Develop a dashboard for resource capacity monitoring to enable real-time adjustments and better capacity planning.
- Strategic Service Improvement:
- Introduce automation and AI-driven solutions to improve monitoring telemetry and communication processes.
- Deploy AI algorithms for pattern and trend detection to proactively address potential issues and optimize system performance.
- Onboarding and Recruitment:
- Develop a comprehensive onboarding program for new SREs to accelerate integration and productivity.
- Expedite the hiring process to fill open positions and ensure the team is adequately staffed to meet demands.
- Alignment with SRE Tenets:
- Implement training programs focused on the five key SRE tenets ensuring all team members understand
Required qualifications capabilities and skills
- 15 years of experience in site reliability engineering with a focus on financial services.
- Advanced proficiency in site reliability culture and principles and can demonstrate how to implement site reliability across application and platform teams while avoiding common pitfalls
- Experience leading technologists to manage and solve complex technological issues at a firmwide level
- Ability to influence the teams culture by championing innovation and change for success
- Experience hiring developing and recognizing talent
- Strong communication skills and a desire to mentor and educate others on SRE principles and practices.
- Technical proficiency in Python Java AWS Cloud Jenkins Terraform Kubernetes Docker and monitoring tools such as Grafana Dynatrace Prometheus Datadog Splunk.
- Demonstrated proficiency in software applications and technical processes within a technical discipline (e.g. cloud artificial intelligence machine learning mobile etc.)
- Proficiency in continuous integration and continuous delivery tools (e.g. Jenkins GitLab Terraform etc.)
- Experience with container and container orchestration (e.g. ECS Kubernetes Docker etc.)
- Experience with troubleshooting common networking technologies and issues
- Formal training or certification on software engineering concepts and 5 years applied experience
- 5 years of experience leading technologists to manage and solve complex technical items within your domain of expertise
Preferred qualifications capabilities and skills
- Ability to code and demonstrate data fluency
- AWS Certified Cloud Practitioner or equivalent certifications
- Bachelor of Engineering or equivalent experience
Required Experience:
Manager