NationsBenefits is at the forefront of transforming the insurance industry by developing innovative benefits management solutions. We specialize in modernizing complex back-office systems to build scalable secure and high-performing platforms that streamline operations for our clients.
Our strategic focus is on platform modernization transitioning legacy systems into modern cloudnative architectures to enhance scalability reliability and performance in core back-office insurance and fintech functions.
Role Overview
The Site Reliability Engineering (SRE) team is instrumental in maintaining the health performance and availability of our platforms. As a Site Reliability Engineer you will play a crucial role in ensuring system reliability by monitoring metrics managing incidents and collaborating with Development DevSecOps and Engineering teams.
You will work with monitoring tools like Datadog troubleshoot incidents in Kubernetes and cloud environments and contribute to automation initiatives using C# Java or scripting languages. Your focus will be on maintaining high availability ensuring security and compliance in fintech environments and driving continuous service improvement.
Key Responsibilities
Incident Triage & Resolution
Act as the first line of defense in identifying triaging and resolving production incidents.
Respond to and troubleshoot alerts from monitoring tools such as Datadog.
Perform initial root cause analysis and escalate per service level agreements (SLAs).
Collaborate with senior engineers to resolve escalated issues and provide timely communication to stakeholders.
Monitoring & Alerting
Proactively monitor system health performance metrics and service uptime using Datadog.
Manage and optimize alerting thresholds to detect anomalies while reducing false positives.
Monitor and troubleshoot workloads in Kubernetes including pod restarts log analysis and deployment rollbacks. 24/7 Support
Participate in a rotational shift schedule (24/7) including weekends and holidays to ensure continuous production support.
Collaboration & Communication
Work closely with development operations and engineering teams to diagnose and resolve issues.
Provide feedback on recurring issues and recommend process or tooling improvements.
Partner with global teams demonstrating strong cross-cultural collaboration skills. Automation & Continuous Improvement
Develop and maintain automation scripts or small tools using C# Java Python PowerShell or Bash.
Contribute to CI/CD pipeline monitoring and reliability.
Assist in building self-healing and automated recovery solutions to minimize manual intervention.
Documentation & Compliance
Maintain comprehensive documentation of incidents triage steps and post-mortem analysis.
Ensure all processes adhere to fintech compliance standards such as PCI DSS or ISO 27001.
Required Qualifications
4 years of experience in Site Reliability Engineering DevOps or a related role.
Experience with incident triage resolution and escalation processes.
Proficiency with Datadog or similar monitoring/observability tools.
Strong scripting or programming skills in C# Java Python PowerShell or Bash.
Experience with Kubernetes (monitoring troubleshooting scaling workloads) and containerized environments like Docker.
Familiarity with SQL MySQL or NoSQL databases.
Ability to work effectively in high-pressure high-transaction fintech environments.
Strong written and verbal communication skills for both technical and non-technical audiences.
Ability to work in rotational 24/7 shifts including weekends and holidays.
Desired Skills
Knowledge of cloud platforms such as Azure AWS or GCP.
Familiarity with CI/CD pipelines Helm charts and deployment automation.
Awareness of ITIL processes and agile methodologies.
Understanding of regulatory and security compliance requirements in fintech (e.g. PCI DSS ISO 27001).
Why Join Us
Competitive salary and benefits.
Collaborative inclusive and growth-focused work environment.
Opportunities for career advancement and professional development.
Hands-on experience with cutting-edge fintech cloud and Kubernetes technologies.
Required Experience:
IC
NationsBenefits is recognized as one of the fastest-growing companies in America and a Healthcare Fintech provider of supplemental benefits, flex cards, and member engagement solutions. We partner with managed care organizations to provide innovative healthcare solutions that drive gr ... View more