Lead Site Reliability Engineer, AIML Platform

JPMorganChase

Not Interested
Bookmark
Report This Job

profile Job Location:

Jersey, NJ - USA

profile Monthly Salary: $ 152000 - 215000
Posted on: 11 hours ago
Vacancies: 1 Vacancy

Job Summary

Description

Responsibilities:

  • Design and implement solutions to enhance the reliability and scalability of AI/ML platforms and applications to accommodate fast growing demands.
  • Partner with product engineering teams to ensure the AI/ML systems are reliable and high performing.
  • Develop observability security automation and fin-ops tools and orchestration.
  • Provide strategic technology leadership by defining and evaluating standards and architecture for reliability observability and automation frameworks.
  • Build strong cross-functional relationships that foster engagements across the organization and deliver solutions to user problems.
  • Debug and solve issues in a production environment identify root cause and remediate.
  • Participates in on-call rotations incident management and escalation workflows.
  • Take full ownership of problems develop solutions and acquire new knowledge to complete the task.
  • Mentor and guide junior engineers.

Required Qualifications:

  • Bachelors degree in computer science Information Technology or equivalent technical qualification with 5 years professional experience.
  • Expertise in SRE principles reliability scalability and performance of application and infrastructure.
  • Have hands-on experience with cloud platforms (AWS GCP Azure) and IaC tools (Terraform Ansible).
  • Extensive experience implementing advanced observability using tools like Open Telemetry Dynatrace Grafana and/or cloud-native services.
  • Experience in architecting distributed systems and cloud-native architecture in AWS.
  • Systematic problem-solving and troubleshooting skills in a complex system.
  • Excellent communication skills and ability to represent and present business and technical concepts to stakeholders.
  • Self-managed self-motivated with strong sense of ownership urgency and drive

Good to have:

  • Prior experience working in AI ML or Data engineering.
  • Priorexperience developing AI Ops/AI Agents.
  • Multi cloud experience (AWS GCP Azure) is a plus



Required Experience:

IC

DescriptionResponsibilities:Design and implement solutions to enhance the reliability and scalability of AI/ML platforms and applications to accommodate fast growing demands.Partner with product engineering teams to ensure the AI/ML systems are reliable and high performing.Develop observability secu...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting

About Company

Company Logo

JPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world’s most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans ov ... View more

View Profile View Profile