DescriptionWe have an opportunity to impact your career and provide an adventure where you can push the limits of whats possible.
As a Lead Software Engineer at JPMorgan Chase within the Cybersecurity Technology and Controls team youare an integral part of an agile team that works to enhance build and deliver trusted market-leading technology products in a secure stable and scalable way. As a core technical contributor you are responsible for conducting critical technology solutions across multiple technical areas within various business functions in support of the firms business objectives.
We are seeking a highly skilled ML Ops Engineer with expertise in deploying monitoring and managing machine learning models in production environments. This role involves working with cutting-edge technologies to ensure scalable reliable and efficient AI solutions. The ideal candidate will be adept at building robust infrastructure and processes to support the seamless operation of machine learning models. In this role you will be responsible for automating model deployment optimizing infrastructure and ensuring the continuous performance of AI systems. Your ability to collaborate with cross-functional teams and address operational challenges will be crucial to driving innovation and delivering impactful AI solutions.
Job responsibilities :
- Collaborate with cross-functional teams including data scientists and software engineers to understand model requirements and integrate them into applications.
- Develop and implement strategies for deploying machine learning models into production ensuring scalability reliability and efficiency.
- Design and maintain continuous integration and continuous deployment (CI/CD) pipelines to automate the testing deployment and updating of machine learning models.
- Manage and optimize the infrastructure required for running machine learning models including cloud services containerization (e.g. Docker) and orchestration tools (e.g. Kubernetes).
- Implement monitoring and logging solutions to track model performance detect anomalies and ensure models are operating as expected in production.
- Maintain version control for models and data ensuring traceability and compliance with governance policies and ensure that deployed models adhere to security best practices and comply with relevant regulations and standards.
- Executes creative software solutions design development and technical troubleshooting with ability to think beyond routine or conventional approaches to build solutions or break down technical problems
- Develops secure high-quality production code and reviews and debugs code written by others
- Identifies opportunities to eliminate or automate remediation of recurring issues to improve overall operational stability of software applications and systems
- Leads communities of practice across Software Engineering to drive awareness and use of new and leading-edge technologies
- Adds to team culture of diversity equity inclusion and respect
Required qualifications capabilities and skills :
- Formal Training or certification on Machine Learning concepts and 5 years applied experience.
- Strong expertise in deploying and managing machine learning models in production environments
- Advanced Python Programming Skills including Pandas Numpy and Scikit- Learn
- Proficiency in building and maintaining CI/CD pipelines for machine learning workflows.
- Proficient in all aspects of the Software Development Life Cycle
- Advanced understanding of agile methodologies such as CI/CD Application Resiliency and Security
- Demonstrated proficiency in software applications and technical processes within a technical discipline (e.g. cloud artificial intelligence machine learning mobile etc.)
- Expertise in cloud platforms (e.g. AWS Google Cloud Azure) and containerization technologies (e.g. Docker Kubernetes).
- Familiarity with monitoring and logging tools (e.g. Prometheus Grafana ELK Stack).
- Excellent problem-solving skills and attention to detail and Strong communication skills to collaborate effectively with cross-functional teams.
- Hands-on practical experience delivering system design application development testing and operational stability
Preferred qualifications capabilities and skills:
- Proven experience in deploying and managing large-scale machine learning models in production environments.
- Strong ability to monitor ML models in production addressing model performance and data quality issues effectively.
- Working knowledge of security best practices and compliance standards for Machine Learning systems.
- Experience with infrastructure optimization techniques to enhance performance and efficiency.
- Development of REST APIs using frameworks such as Flask or FastAPI for seamless integration into business solutions.
- Familiarity with creating and utilizing synthetic datasets to improve model training and evaluation.
- Bachelors degree in Computer Science Engineering or a related field with relevant experience in ML Ops or related roles.