Senior Java Site Reliability Engineer (SRE)

Staffxpert LLC


Job Location:

McLean, MD - USA

Monthly Salary: Not Disclosed
Posted on: 3 hours ago
Vacancies: 1 Vacancy

Job Summary

Job Title: Senior Java Site Reliability Engineer (SRE)
Location: McLean VA (Hybrid)

Job Summary

STAFFXPERT LLC is seeking a Senior Java Site Reliability Engineer (SRE) on behalf of our client in McLean VA (Hybrid). This role is focused on supporting and enhancing highly available mission-critical enterprise platforms within a large-scale financial services environment. The ideal candidate will bring deep expertise in production support reliability engineering cloud platforms automation observability and incident management with strong experience in enterprise Java-based systems.

Key Responsibilities
  • Support and maintain highly available production systems across cloud and distributed environments
  • Lead incident management problem management root cause analysis (RCA) and platform stability initiatives
  • Monitor and ensure uptime performance and reliability of Java applications and microservices
  • Identify troubleshoot and resolve application and system performance bottlenecks
  • Design and implement resiliency patterns including circuit breakers retries failover and high-availability architectures
  • Improve observability through monitoring logging alerting and automation of incident response
  • Collaborate with development infrastructure platform and cloud engineering teams to enhance deployment reliability
  • Support cloud transformation infrastructure modernization and automation initiatives
  • Coordinate disaster recovery testing resiliency validation capacity planning and production readiness reviews
  • Drive operational excellence and continuous service improvement initiatives
  • Provide technical leadership and mentor distributed engineering teams
Required Qualifications
  • 16 20 years of experience in Site Reliability Engineering Production Engineering Platform Engineering or Application Support roles
  • Strong experience supporting large-scale enterprise production environments
  • Proven expertise in incident management problem management and operational support
  • Experience working in Banking Financial Services FinTech or other highly regulated environments
  • Hands-on experience with mission-critical applications requiring high availability scalability and performance
  • Strong troubleshooting analytical and problem-solving skills
Technical Skills
  • Java
  • Linux / Unix Administration
  • Kubernetes Docker
  • Cloud Platforms: AWS / Azure / GCP
  • CI/CD Tools: Jenkins GitHub Actions GitLab CI/CD ArgoCD
  • Infrastructure as Code: Terraform Ansible
  • Monitoring & Observability: Splunk Datadog Grafana Prometheus Moogsoft
  • ITSM Tools: ServiceNow JIRA Confluence
  • Scripting: Python Bash/Shell
  • SQL and database troubleshooting
  • Application Performance Monitoring (APM) tools
  • Production release management
  • Disaster recovery and high availability architectures
Education
  • Bachelors degree in Computer Science Information Systems Engineering or related field
Preferred Qualifications
  • Strong cloud-native and microservices architecture experience
  • Ability to lead critical production incidents and drive long-term reliability improvements
  • Excellent communication and stakeholder management skills
  • Experience mentoring and leading global engineering teams
Job Title: Senior Java Site Reliability Engineer (SRE) Location: McLean VA (Hybrid) Job Summary STAFFXPERT LLC is seeking a Senior Java Site Reliability Engineer (SRE) on behalf of our client in McLean VA (Hybrid). This role is focused on supporting and enhancing highly available mission-critical en...