Site Reliability Developer 4

Oracle

Not Interested
Bookmark
Report This Job

profile Job Location:

Noida - India

profile Monthly Salary: Not Disclosed
Posted on: Yesterday
Vacancies: 1 Vacancy

Job Summary

Description

Job Description

About Oracle Cloud:
Oracle Cloud is a comprehensive suite of cloud servicesincluding infrastructure platform and applicationsdesigned to help organizations build deploy and manage workloads securely at scale. At Oracle we are building the most intelligent future of cloud computing. Our team is composed of talented motivated and diverse individuals committed to empowering our customers to accomplish their most important missions using Oracle Cloud Fusion Applications. We center our work around our customers needs striving to continuously enhance our cloud capabilities based on their challenges.

About the Team:
Join theFusion Site Reliability Engineering Middleware (FSRE-MW)a critical group dedicated to maintaining the high availability of Oracles Cloud Fusion Applications. We minimize the frequency and duration of customer-impacting events through large-scale incident management and automation. As a team we combine the agility of a start-up with the scale and customer focus of a leading enterprise software company.

As a Principal Site Reliability Engineer you will be a key member of a high-impact team focused on the availability performance and operational excellence of Fusion SRE Middleware. You will take ownership of production environmentsincluding systems and the Fusion Middleware stackand support mission-critical business operations for Cloud Fusion role will emphasize automation and optimization of operations across multiple production environments recommending AI-driven solutions to enhance availability performance and supportability. You will harness AI-based tools and predictive analytics to proactively identify issues automate incident responses and continuously improve system resilience. Additionally you will provide escalation support for complex production problems guide junior engineers participate in major incident bridges and help build and refine processes and procedures using AI-powered insights to drive smarter data-driven decisions.

Our team is front-and-center in reducing event duration leveraging operational experience best practices and tool development to automate incident management and drive continual improvement.

About the Role:
We seek a Principal SRE to join our globally distributed team responsible for detecting triaging and mitigating service-impacting events rapidly and effectively through automation and AI-powered insights. You will be part of a regional team minimizing Fusion services downtime through exceptional incident management and system operations with a strong emphasis on scalability performance security and AI-driven this dynamic role you will gain deep insight into the inner workings of Oracle Cloud Fusion Apps using AI tools to predict identify and address potential issues before they impact services. Youll influence cross-functional leaders and drive programs that boost service availability while leveraging AI to enhance real-time decision-making and improve operational efficiency.

If youre passionate about leveraging AI to break new ground as part of an agile team we want to speak with you!

Our Values:
Oracles valuesequity inclusion respect and commitment to the greater goodare foundational to our success. We foster opportunities for learning and growth and challenge one another to build the future together. As part of our team youll join a group of hardworking and diverse professionals given the autonomy and support to do your best work in a flexible and dynamic environment.

Career Level:IC4

Employer Description:
As a world leader in cloud solutions Oracle uses tomorrows technology to tackle todays challenges. For over 40 years weve thrived by operating with integrity and partnering with industry leaders across nearly every sector.

Innovation begins with empowering everyone to contribute which is why we are committed to building an inclusive equitable workforce. Oracle careers offer global opportunities and a healthy work-life balance with competitive benefits including medical life and retirement options. We support our communities through volunteer programs and encourage a spirit of giving back.

Oracle is committed to including people with disabilities at all stages of the employment process. If you need assistance or accommodation for a disability emailor call 1 in the United States.

We are an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability protected veterans status or any other characteristic protected by law. Oracle will also consider qualified applicants with arrest and conviction records as permitted by applicable law.



Responsibilities

Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration technical dependencies and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack with focus on security resiliency scale and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale capacity security performance attributes and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.

Key Responsibilities:

  • Automation:
    Develop and optimize operations through AI-powered automation. Apply machine learning and orchestration principles to every possible opportunity reducing manual intervention and technical debt. Enhance operational outcomes with scalable AI-driven automation solutions that anticipate issues and optimize system performance proactively.
  • Middleware Technology Expert:
    Lead L3 WebLogic Administration managing server lifecycle configuring and deploying applications and monitoring server and application resources. Leverage AI-driven monitoring tools to proactively detect and resolve issues across application and infrastructure layers ensuring efficient and automated troubleshooting.
  • Service Ownership:
    Act as a Service Owner for Fusion Apps customers sharing full-stack ownership of critical services in partnership with Service Development and Operations. Utilize AI-based analytics to predict potential service disruptions and optimize service delivery to improve customer satisfaction and minimize downtime.
  • Technical Expertise:
    Provide deep technical guidance and serve as the ultimate escalation point for complex issues not documented in SOPs. Participate in major incident management as a subject matter expert leveraging your understanding of service topologies AI-driven insights and dependencies to troubleshoot and resolve issues quickly and effectively.
  • Ownership Scope:
    Understand end-to-end configuration dependencies and behavioral characteristics of production services. Use AI-powered telemetry and monitoring systems to ensure mission-critical delivery with a focus on system health security resiliency scale and performance.
  • Service Requirements:
    Provide strategic direction and prioritization to Product Management and Service Development teams guiding the addition of AI-enhanced capabilities to Oracle SaaS/ERP services. Act as an escalation point for undocumented or critical issues leveraging AI tools to aid in faster resolution and proactive service improvements.

Professional Skills Requirements:

  • Excellent written and verbal communication facilitation and interpersonal skills.
  • Strong collaboration customer service empathy flexibility and conflict resolution abilities.
  • Ability to communicate clearly with technical and non-technical stakeholders.
  • Effective at working independently and managing multiple projects or responsibilities.
  • Highly motivated with the ability to thrive in fast-paced team-oriented environments.
  • Strong analytical and problem-solving skills.
  • Adaptability to evolving priorities and deadlines.
  • Strong global teamwork skills.
  • Proven ability to handle multiple competing priorities.

Required Qualifications:

  • Bachelors degree in Computer Science or a related field or equivalent experience.
  • Overall 12 years of experience in IT industry.
  • 8 years of experience in Site Reliability Engineering (SRE) orDevOps or Systems Engineering.
  • Design develop and support AI-based enterprise applications using modern AI tools and technologies (e.g. Agentic AI AI Agents Retrieval Augmented Generation RAG Model context protocol MCP Large Language Models).
  • Apply classical AI techniques such as clustering classification regression Monte Carlo simulations and Bayesian blending.
  • Participate in system planning scheduling and implementation for enterprise-scale AI applications.
  • Collaborate with cross-functional teams to understand requirements and deliver innovative AI-driven solutions.
  • Ensure application scalability reliability and performance in a cloud environment.
  • Contribute to best practices for development testing and deployment of AI applications.
  • Work with Oracle Vector Database and other retrieval systems to optimize AI performance.
  • Maintaining and handle security and bias in large language model in implementing and developing AI agents.
  • Proficiency in programming languages relevant to AI (e.g. Python Java).
  • Familiarity with modern AI/ML frameworks and platforms (e.g. TensorFlow PyTorch Oracle Cloud AI services).
  • Practical experience with enterprise software development including REST APIs microservices and cloud-native architectures.
  • Use and contribute to the Continuous Integration and Continuous Delivery (CI/CD) process for building and delivering security tools
  • Experience with CI/CD tools (Kubernetes Jenkins Maven Gradle Ant or similar).
  • Productionize AI services with CI/CD pipelines containerization orchestration and autoscaling.
  • Instrument traces metrics and logs across prompts retrieval tools agents and model outputs.
  • Enforce SLAs through canary and blue-green rollouts with safe rollback procedures.
  • Collaborate with cross-functional teams to scale AI offerings across enterprise environments.
  • Mentor engineers and foster a culture of engineering excellence.
  • Develop and maintain robust software toolkits in Python React and Java to support applied scientists in building testing and deploying ML models and agent frameworks.
  • Design and implement cloud-based services and APIs for model execution orchestration asynchronous communication and multimodal workflows.
  • Apply deep knowledge of algorithms data structures concurrent programming and distributed systems to build high-performance and maintainable software.
  • Participate in code reviews provide mentorship incorporate feedback and help shape engineering standards.
  • Stay current with emerging trends in AI infrastructure agent frameworks HPC systems and cloud-native technologies; evaluate and integrate them where appropriate.

Experience with AI-driven Monitoring and Predictive Analytics

If youre ready to shape the future of cloud services at Oracle we want to connect!
Apply today to join our innovative team.



Qualifications

Career Level - IC4




Required Experience:

IC

DescriptionJob DescriptionAbout Oracle Cloud:Oracle Cloud is a comprehensive suite of cloud servicesincluding infrastructure platform and applicationsdesigned to help organizations build deploy and manage workloads securely at scale. At Oracle we are building the most intelligent future of cloud com...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting

About Company

Company Logo

As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s challenges. We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity. We know that true innovation starts when eve ... View more

View Profile View Profile