Cloud GenAI Governance & Observability Consultant
Charlotte, VT - USA
Job Summary
Description:
Local candidates only.
Must be onsite at client in Charlotte NC at least 3 days/week
Note: This role is focused entirely on generative LLMs; traditional predictive ML or data science backgrounds are not a fit.
Role Overview:
We are seeking a Senior Cloud GenAI Governance Engineer to manage enterprise-scale GenAI platform engineering. This role focuses on the critical gateway and security layers managing the architecture flow from User API requests through Model Armor to the inference endpoint and finally into telemetry and Arize AI. The core of this position is building responsible AI infrastructure and AI runtime governance.
We are seeking a Senior Cloud GenAI Governance Engineer to manage enterprise-scale GenAI platform engineering. This role focuses on the critical gateway and security layers managing the architecture flow from User API requests through Model Armor to the inference endpoint and finally into telemetry and Arize AI. The core of this position is building responsible AI infrastructure and AI runtime governance.
Key Responsibilities
Enterprise AI Security: Act as the primary owner of AI security guardrails. Implement and manage Model Armor to handle prompt filtering data leakage prevention request/response inspection and enterprise-safe inference.
Advanced API Ecosystem Debugging: Manage complex API lifecycle tracing rather than simple API integration. Troubleshoot endpoint bottlenecks container orchestration failure analysis and trace the exact origin of 429 errors across ingress gateway load balancing and GPU saturation.
AI Observability & Telemetry: Utilize Arize AI for runtime monitoring inference tracing and model behavior analysis.
Traffic & Gateway Operations: Manage distributed inference debugging token queueing and upstream/downstream API flows.
Cloud Infrastructure: Maintain GCP/Azure landing zones Terraform deployments and Kubernetes clusters to support these governance pipelines.
Enterprise AI Security: Act as the primary owner of AI security guardrails. Implement and manage Model Armor to handle prompt filtering data leakage prevention request/response inspection and enterprise-safe inference.
Advanced API Ecosystem Debugging: Manage complex API lifecycle tracing rather than simple API integration. Troubleshoot endpoint bottlenecks container orchestration failure analysis and trace the exact origin of 429 errors across ingress gateway load balancing and GPU saturation.
AI Observability & Telemetry: Utilize Arize AI for runtime monitoring inference tracing and model behavior analysis.
Traffic & Gateway Operations: Manage distributed inference debugging token queueing and upstream/downstream API flows.
Cloud Infrastructure: Maintain GCP/Azure landing zones Terraform deployments and Kubernetes clusters to support these governance pipelines.
Required Qualifications
8 years experience in API lifecycle tracing * limiting and token throughput management.
8 years hands-on expertise with AI security tools specifically Model Armor.
Deep understanding of enterprise AI observability platforms particularly Arize AI.
Strong background in cloud networking (GCP/Azure) and Kubernetes infrastructure.
8 years experience in API lifecycle tracing * limiting and token throughput management.
8 years hands-on expertise with AI security tools specifically Model Armor.
Deep understanding of enterprise AI observability platforms particularly Arize AI.
Strong background in cloud networking (GCP/Azure) and Kubernetes infrastructure.