Senior Java Site Reliability Engineer (SRE)

Staffxpert LLC

Job Location:

McLean, MD - USA

Monthly Salary: Not Disclosed

Posted on: 3 hours ago

Vacancies: 1 Vacancy

Job Summary

Job Title: Senior Java Site Reliability Engineer (SRE)
Location: McLean VA (Hybrid)

Job Summary

STAFFXPERT LLC is seeking a Senior Java Site Reliability Engineer (SRE) on behalf of our client in McLean VA (Hybrid). This role is focused on supporting and enhancing highly available mission-critical enterprise platforms within a large-scale financial services environment. The ideal candidate will bring deep expertise in production support reliability engineering cloud platforms automation observability and incident management with strong experience in enterprise Java-based systems.

Key Responsibilities

Support and maintain highly available production systems across cloud and distributed environments
Lead incident management problem management root cause analysis (RCA) and platform stability initiatives
Monitor and ensure uptime performance and reliability of Java applications and microservices
Identify troubleshoot and resolve application and system performance bottlenecks
Design and implement resiliency patterns including circuit breakers retries failover and high-availability architectures
Improve observability through monitoring logging alerting and automation of incident response
Collaborate with development infrastructure platform and cloud engineering teams to enhance deployment reliability
Support cloud transformation infrastructure modernization and automation initiatives
Coordinate disaster recovery testing resiliency validation capacity planning and production readiness reviews
Drive operational excellence and continuous service improvement initiatives
Provide technical leadership and mentor distributed engineering teams

Required Qualifications

16 20 years of experience in Site Reliability Engineering Production Engineering Platform Engineering or Application Support roles
Strong experience supporting large-scale enterprise production environments
Proven expertise in incident management problem management and operational support
Experience working in Banking Financial Services FinTech or other highly regulated environments
Hands-on experience with mission-critical applications requiring high availability scalability and performance
Strong troubleshooting analytical and problem-solving skills

Technical Skills

Java
Linux / Unix Administration
Kubernetes Docker
Cloud Platforms: AWS / Azure / GCP
CI/CD Tools: Jenkins GitHub Actions GitLab CI/CD ArgoCD
Infrastructure as Code: Terraform Ansible
Monitoring & Observability: Splunk Datadog Grafana Prometheus Moogsoft
ITSM Tools: ServiceNow JIRA Confluence
Scripting: Python Bash/Shell
SQL and database troubleshooting
Application Performance Monitoring (APM) tools
Production release management
Disaster recovery and high availability architectures

Education

Bachelors degree in Computer Science Information Systems Engineering or related field

Preferred Qualifications

Strong cloud-native and microservices architecture experience
Ability to lead critical production incidents and drive long-term reliability improvements
Excellent communication and stakeholder management skills
Experience mentoring and leading global engineering teams

Job Title: Senior Java Site Reliability Engineer (SRE) Location: McLean VA (Hybrid) Job Summary STAFFXPERT LLC is seeking a Senior Java Site Reliability Engineer (SRE) on behalf of our client in McLean VA (Hybrid). This role is focused on supporting and enhancing highly available mission-critical en...