Engineering-L2-Bengaluru-Vice President-AI ML Engineering

Goldman Sachs

Not Interested
Bookmark
Report This Job

profile Job Location:

Bengaluru - India

profile Monthly Salary: $ 150000 - 250000
Posted on: 15 hours ago
Vacancies: 1 Vacancy

Job Summary

Description

BUSINESS UNIT OVERVIEW

Enterprise Technology Operations (ETO) is a Business Unit within Core Engineering focused on running scalable production management services with a mandate of operational excellence and operational risk reduction achieved through large scale automation best-in-class engineering and application of data science and machine learning. The Production Runtime Experience (PRX) team in ETO applies software engineering and machine learning to production management services processes and activities to streamline monitoring alerting automation and workflows.

TEAM OVERVIEW The Machine Learning and Artificial Intelligence team in PRX applies advanced ML and GenAI to reduce the risk and cost of operating the firms large-scale compute infrastructure and extensive application estate. Building on strengths in statistical modelling anomaly detection predictive modelling and time-series forecasting we leverage foundational LLM Models to orchestrate multi-agent systems for automated production management services. By unifying classical ML with agentic AI we deliver reliable explainable and cost-efficient operations at scale.

ROLE AND RESPONSIBILITIES In this role you will be responsible for launching and implementing GenAI agentic solutions aimed at reducing the risk and cost of managing large-scale production environments with varying complexities. You will address various production runtime challenges by developing agentic AI solutions that can diagnose reason and take actions in production environments to improve productivity and address issues related to production support.

What youll do:

Build agentic AI systems: Design and implement tool-calling agents that combine retrieval structured reasoning and secure action execution (function calling change orchestration policy enforcement) following MCP protocol. Engineer robust guardrails for safety compliance and least-privilege access.

Productionize LLMs: Build evaluation framework for open-source and foundational LLMs; implement retrieval pipelines prompt synthesis response validation and self-correction loops tailored to production operations.

Integrate with runtime ecosystems: Connect agents to observability incident management and deployment systems to enable automated diagnostics runbook execution remediation and post-incident summarization with full traceability.

Collaborate directly with users: Partner with production engineers and application teams to translate production pain points into agentic AI roadmaps; define objective functions linked to reliability risk reduction and cost; and deliver auditable business-aligned outcomes.

Safety reliability and governance: Build validator models adversarial prompts and policy checks into the stack; enforce deterministic fallbacks circuit breakers and rollback strategies; instrument continuous evaluations for usefulness correctness and risk.

Scale and performance: Optimize cost and latency via prompt engineering context management caching model routing and distillation; leverage batching streaming and parallel tool-calls to meet stringent SLOs under real-world load.

Build a RAG pipeline: Curate domain-knowledge; build data-quality validation framework; establish feedback loops and milestone framework maintain knowledge freshness.

Raise the bar: Drive design reviews experiment rigor and high-quality engineering practices; mentor peers on agent architectures evaluation methodologies and safe deployment patterns.

QUALIFICATIONS

A Bachelors degree (Masters/ PhD preferred) in a computational field (Computer Science Applied Mathematics Engineering or in a related quantitative discipline) with 7+ years of experience as an applied data scientist / machine learning engineer.

ESSENTIAL SKILLS

7+ years of software development in one or more languages (Python C/C++ Go Java); strong hands-on experience building and maintaining large-scale Python applications preferred.

3+ years designing architecting testing and launching production ML systems including model deployment/serving evaluation and monitoring data processing pipelines and model fine-tuning workflows.

Practical experience with Large Language Models (LLMs): API integration prompt engineering finetuning/adaptation and building applications using RAG and tool-using agents (vector retrieval function calling secure tool execution).

Understanding of different LLMs both commercial and open source and their capabilities (e.g. OpenAI Gemini Llama Qwen Claude).

Solid grasp of applied statistics core ML concepts algorithms and data structures to deliver efficient and reliable solutions.

Strong analytical problem-solving ownership and urgency; ability to communicate complex ideas simply and collaborate effectively across global teams with a focus on measurable business impact.

Preferred:

Proficiency building and operating on cloud infrastructure (ideally AWS) including containerized services (ECS/EKS) serverless (Lambda) data services (S3 DynamoDB Redshift) orchestration (Step Functions) model serving (SageMaker) and infra-as-code (Terraform/CloudFormation).

YOUR CAREER

Goldman Sachs is a meritocracy where you will be given all the tools to advance your career. At Goldman Sachs you will have access to excellent training programmes designed to improve multiple facets of your skill portfolio. Our in-house training programme Goldman Sachs University offers a comprehensive series of courses that you will have access to as your career progresses. Goldman Sachs University has an impressive catalogue of courses which span technical business and leadership skills.

Salary Range
The expected base salary for this New York New York United States-based position is $150000-$250000. In addition you may be eligible for a discretionary bonus if you are an active employee as of fiscal year-end.

Benefits
Goldman Sachs is committed to providing our people with valuable and competitive benefits and wellness offerings as it is a core part of providing a strong overall employee experience. A summary of these offerings which are generally available to active non-temporary full-time and part-time US employees who work at least 20 hours per week can be found here.




Required Experience:

Exec

DescriptionBUSINESS UNIT OVERVIEWEnterprise Technology Operations (ETO) is a Business Unit within Core Engineering focused on running scalable production management services with a mandate of operational excellence and operational risk reduction achieved through large scale automation best-in-class ...
View more view more

Key Skills

  • React Native
  • AI
  • Enterprise Software
  • React
  • Node.js
  • Redis
  • AWS
  • Software Development
  • IOS
  • Team Management
  • Product Development
  • Mobile Applications

About Company

The Goldman Sachs Group, Inc. is a leading global investment banking, securities, and asset and wealth management firm that provides a wide range of financial services.

View Profile View Profile