Site Reliability Production Engineer (AIML Ops) DevOps Consultant
Smithfield, NC - USA
Job Summary
Hope you are doing well
Please find the job description given below and let me know your interest.
Position: Site Reliability Production Engineer (AI/ML Ops) / DevOps Consultant Hybrid in Smithfield RI; Merrimack NH; Westlake TX Ony W2
Location: Hybrid in Smithfield RI; Merrimack NH; Westlake TX
Duration : 6 month project
Interview: Webcam
Note: LinkedIn is a must
JOB Description:-
Technical Skills:
- Strong scripting experience with PowerShell and Python
- Automation experience using AI/LLM technologies
- Experience with CI/CD pipelines cloud platforms (AWS/Azure)
- Observability tooling (e.g. Dynatrace Nexthink)
Team
The Business Unit aligned functions including Infrastructure Support Cloud Enablement Platform Engineering Environment Management Incident Support and Deployments.
Team
The Business Unit aligned functions including Infrastructure Support Cloud Enablement Platform Engineering Environment Management Incident Support and Deployments.
We are looking for a strong Site Reliability Production Engineer (to join our Production Support Engineering this role you will help enhance service reliability reduce operational toil and drive continuous improvement through automation observability and emerging AI/LLMdriven capabilities.
The Expertise & Skills You Bring
5 years in SRE DevOps or production engineering supporting distributed systems in fast-paced environments.
Strong scripting experience with PowerShell and Python; practical knowledge of SQL XML and data integration.
Hands-on experience with observability tooling (e.g. Dynatrace Nexthink) monitoring logging and metrics systems.
Knowledge of ITSM and ITIL frameworks and experience with ServiceNow or similar platforms.
Strong understanding of DevOps/SRE principles including SLIs SLOs error budgets automation and resiliency patterns.
Proven experience with CI/CD pipelines cloud platforms (AWS/Azure) and modern SaaS solutions.
Technical depth in Windows and Linux systems and enterprise end-user computing environments.
Ability to translate analytical insights into actions automations and operational improvements.
Familiarity with AI/LLM technologies and how to use them to improve workflows automation observability or troubleshooting (preferred).
Strong problem-solving communication and cross-team collaboration skills.
Bachelor s degree or equivalent experience in Computer Science or related field (preferred).
Proven experience supporting financial systems or working in financial services is a plus.
What You ll Do
Analyze system and application metrics to improve performance reliability and fault detection.
Partner closely with engineering teams to design build deploy and support resilient services.
Contribute to system design reviews platform management and capacity planning.
Build sustainable automation to reduce manual effort and operational overhead.
Develop and refine SLI/SLO/SLA frameworks to balance speed reliability and customer experience.
Improve observability across environments using modern tools and practices.
Identify prototype and implement automation using scripting infrastructure tooling and AI/LLM-based solutions.
Diagnose and tackle complex issues across distributed systems and end-user computing environments.
Evaluate new technologies patterns and tools to drive continuous improvement.
Create and deliver high-quality technical content remote actions and workflows to enable self-service and operational efficiency.
What We re Looking For in You
A proactive approach focused on reliability performance and continuous improvement.
Curiosity and the ability to quickly learn complex systems and processes.
Passion for automation reducing toil and improving operational perfection.
Excitement for working in a fast-paced collaborative globally distributed environment.
|