GCP Agentic Platform Support Lead

Not Interested
Bookmark
Report This Job

profile Job Location:

New York City, NY - USA

profile Monthly Salary: Not Disclosed
Posted on: 7 hours ago
Vacancies: 1 Vacancy

Job Summary

Role : GCP Agentic Platform Support Lead

Location : New York NY 10019 (Need local candidates/Hybrid)
Client: Persistent


Detailed JD:

The platform support lead will set the foundation and requirements for support on the GCP Data & AI platform. They will define standards for platform health managing incident resolution and executing routine maintenance to support the platform. They will develop GCP cloud logging and monitoring reports to support visibility across the platform.

Activities are comprised of:

1. SLA & Reliability Reporting

1. Establish the initial framework for tracking Mean Time to Repair (MTTR) and Mean Time Between Failures (MTBF)

2. Configure self-service billing and uptime dashboards for Con Edison stakeholders

2. Foundation Maintenance & Optimization

1. Develop and deploy the initial suite of Cloud Logging and Monitoring reports to establish platform visibility

2. Monitor GCP billing for anomalies (e.g. BigQuery slot spikes) and implement tactical fixes to ensure budget adherence

3. Build and maintain the Golden Path runbooks to ensure operational procedures are documented as they are established

3. Platform Monitoring & Incident Management

1. Conduct solo reviews of overnight batch processing logs (e.g. Cloud Composer/Dataflow) to verify completion and identify failures before business hours progress

2. Receive and prioritize platform-related tickets; determine if issues stem from infrastructure pipelines or upstream sources

3. Execute root cause analysis (RCA) and apply fixes for code-based failures IAM errors or configuration drifts

4. Act as the primary technical point of contact for Google Cloud Support or Con Edison Source System teams (SAP GIS) when issues are external to the platform

4. Minor Enhancements (Capacity-Based

1. Maintain a prioritized backlog of minor requests to be addressed only after platform stability and incidents are managed

2. Within available bandwidth execute minor schema updates ingestion schedule tweaks or IAM modifications

Workstream Deliverables:

1. Operations Runbook: The definitive MS Word resource reflecting current operational procedures and recovery steps (MS Word)

2. Integrated Health & Cost Reporting: Automated tracking of service uptime and GCP spend via Cloud Monitoring (Cloud Monitoring Reports)

3. Unified Incident & RCA Logs: A centralized record of Critical/High severity incidents and their resolutions stored in the agreed management tool (ServiceNow/Jira or similar)

4. Recovery & Maintenance Code: Validated code merged into the repository for bug fixes and configuration updates including detailed release notes (GCP Code)

Role : GCP Agentic Platform Support Lead Location : New York NY 10019 (Need local candidates/Hybrid) Client: Persistent Detailed JD: The platform support lead will set the foundation and requirements for support on the GCP Data & AI platform. They will define standards for platform health managin...
View more view more

Key Skills

  • Administrative Skills
  • Facilities Management
  • Biotechnology
  • Creative Production
  • Design And Estimation
  • Architecture