Principal SaaS Capacity Engineer

Oracle

Not Interested
Bookmark
Report This Job

profile Job Location:

Zapopan - Mexico

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Description

Required Qualifications

  • Bachelors or Masters degree in Computer Science Electrical Engineering Cloud/Systems Engineering or a related field.
  • 5 years of experience in cloud infrastructure SaaS operations or capacity engineering roles.
  • Hands-on experience with large-scale distributed systems OCI (or AWS Azure GCP) and SaaS production environments.
  • Strong programming and scripting experience (Python Go Shell SQL) for automation and AI/ML model deployment.
  • Proven experience deploying AI/ML solutions for capacity forecasting anomaly detection and intelligent workload tuning.
  • Deep understanding of cloud capacity topology and distributed service dependencies.
  • Proficiency with infrastructure-as-code (Terraform Ansible Helm Kubernetes).
  • Familiarity with AIOps tools and AI-driven observability platforms (Datadog Dynatrace Splunk or similar).
  • Knowledge of resiliency and disaster recovery strategies including AI-simulated failure modeling.

Preferred Qualifications

  • Advanced degree (Masters/PhD) with specialization in AI ML Data Science or distributed systems engineering.
  • Experience building and deploying self-healing AI-driven automation at scale in a SaaS environment.
  • Domain expertise in reinforcement learning applications for automated resource optimization.
  • Direct exposure to Oracle Cloud Infrastructure (OCI) systems and tools.
  • Experience with cloud-native AI/ML services MLOps and continuous model monitoring.

Competencies and Skills

  • Expertise in designing developing and deploying AI/ML models for cloud infrastructure use cases (demand forecasting anomaly detection workload optimization).
  • Advanced proficiency in automation orchestration and self-healing system architectures.
  • Skilled in communicating technical concepts AI-powered analytics and strategic insights to engineering and executive audiences.
  • Strong analytical and critical thinking skills with a deep data-driven mindset.
  • Curiosity and initiative to explore APIs system profiles and operational anomalies translating technical findings into impactful business outcomes.
  • Highly collaborative adaptive and passionate about operational excellence and continuous learning.
  • Ability to influence cross-team priorities and drive best practices in AI-enhanced capacity engineering.


Responsibilities

Key Responsibilities

  • Service Accountability: Ensure SaaS production capacity availability optimization scaling automation reserve management and quota governance.
  • AI/ML Integration: Apply AI/ML models for predictive capacity forecasting anomaly detection and workload auto-tuning to anticipate demand spikes and prevent outages.
  • Observability & AIOps: Leverage AI-powered observability and AIOps platforms for end-to-end system monitoring intelligent alerting and automated incident mitigation.
  • Strategic Partnership: Collaborate with Product and Development teams to design validate and align AI-driven scaling and capacity planning strategies with new launches and initiatives.
  • Automation & Orchestration: Design implement and optimize automation and orchestration pipelines including self-healing systems policy-driven provisioning and disaster recovery simulations using AI to enhance reliability and operational resilience.
  • Data-Driven Decision Support: Deliver advanced instrumentation AI-powered analytics and actionable dashboards to inform executives engineering teams and stakeholders.
  • Technical Leadership: Translate complex OCI stack and cloud platform resources (compute storage DB networking) into business-ready AI-enhanced capacity solutions and performance profiles.
  • Simulation & Resiliency: Use AI/ML models to simulate validate and improve resiliency and disaster recovery scenarios for faster more robust recovery response.
  • Collaboration & Communication: Present AI-driven insights risks and recommendations to engineering teams ICs and executives to illuminate capacity trends and data-driven priorities.
  • Continuous Innovation: Assess new AI/ML techniques AIOps platforms and automation tools for ongoing improvements in infrastructure reliability scalability and cost optimization.


Qualifications

Career Level - IC4




Required Experience:

Staff IC

DescriptionRequired QualificationsBachelors or Masters degree in Computer Science Electrical Engineering Cloud/Systems Engineering or a related field.5 years of experience in cloud infrastructure SaaS operations or capacity engineering roles.Hands-on experience with large-scale distributed systems O...
View more view more

Key Skills

  • Design
  • Academics
  • AutoCAD 3D
  • Cafe
  • Fabrication
  • Java

About Company

Company Logo

Oracle provides the world's most complete, open, and integrated business software and hardware systems, with more than 370,000 customers—including 100 of the Fortune 100—representing a variety of sizes and industries in more than 145 countries around the globe. And Oracle's 110,000 gl ... View more

View Profile View Profile