Job Title: Senior Java Site Reliability Engineer (SRE) Location: McLean VA (Hybrid)
Job Summary
STAFFXPERT LLC is seeking a Senior Java Site Reliability Engineer (SRE) on behalf of our client in McLean VA (Hybrid). This role is focused on supporting and enhancing highly available mission-critical enterprise platforms within a large-scale financial services environment. The ideal candidate will bring deep expertise in production support reliability engineering cloud platforms automation observability and incident management with strong experience in enterprise Java-based systems.
Key Responsibilities
Support and maintain highly available production systems across cloud and distributed environments
Lead incident management problem management root cause analysis (RCA) and platform stability initiatives
Monitor and ensure uptime performance and reliability of Java applications and microservices
Identify troubleshoot and resolve application and system performance bottlenecks
Design and implement resiliency patterns including circuit breakers retries failover and high-availability architectures
Improve observability through monitoring logging alerting and automation of incident response
Collaborate with development infrastructure platform and cloud engineering teams to enhance deployment reliability
Support cloud transformation infrastructure modernization and automation initiatives
Coordinate disaster recovery testing resiliency validation capacity planning and production readiness reviews
Drive operational excellence and continuous service improvement initiatives
Provide technical leadership and mentor distributed engineering teams
Required Qualifications
16 20 years of experience in Site Reliability Engineering Production Engineering Platform Engineering or Application Support roles
Strong experience supporting large-scale enterprise production environments
Proven expertise in incident management problem management and operational support
Experience working in Banking Financial Services FinTech or other highly regulated environments
Hands-on experience with mission-critical applications requiring high availability scalability and performance
Strong troubleshooting analytical and problem-solving skills
Disaster recovery and high availability architectures
Education
Bachelors degree in Computer Science Information Systems Engineering or related field
Preferred Qualifications
Strong cloud-native and microservices architecture experience
Ability to lead critical production incidents and drive long-term reliability improvements
Excellent communication and stakeholder management skills
Experience mentoring and leading global engineering teams
Job Title: Senior Java Site Reliability Engineer (SRE) Location: McLean VA (Hybrid) Job Summary STAFFXPERT LLC is seeking a Senior Java Site Reliability Engineer (SRE) on behalf of our client in McLean VA (Hybrid). This role is focused on supporting and enhancing highly available mission-critical en...
Job Title: Senior Java Site Reliability Engineer (SRE) Location: McLean VA (Hybrid)
Job Summary
STAFFXPERT LLC is seeking a Senior Java Site Reliability Engineer (SRE) on behalf of our client in McLean VA (Hybrid). This role is focused on supporting and enhancing highly available mission-critical enterprise platforms within a large-scale financial services environment. The ideal candidate will bring deep expertise in production support reliability engineering cloud platforms automation observability and incident management with strong experience in enterprise Java-based systems.
Key Responsibilities
Support and maintain highly available production systems across cloud and distributed environments
Lead incident management problem management root cause analysis (RCA) and platform stability initiatives
Monitor and ensure uptime performance and reliability of Java applications and microservices
Identify troubleshoot and resolve application and system performance bottlenecks
Design and implement resiliency patterns including circuit breakers retries failover and high-availability architectures
Improve observability through monitoring logging alerting and automation of incident response
Collaborate with development infrastructure platform and cloud engineering teams to enhance deployment reliability
Support cloud transformation infrastructure modernization and automation initiatives
Coordinate disaster recovery testing resiliency validation capacity planning and production readiness reviews
Drive operational excellence and continuous service improvement initiatives
Provide technical leadership and mentor distributed engineering teams
Required Qualifications
16 20 years of experience in Site Reliability Engineering Production Engineering Platform Engineering or Application Support roles
Strong experience supporting large-scale enterprise production environments
Proven expertise in incident management problem management and operational support
Experience working in Banking Financial Services FinTech or other highly regulated environments
Hands-on experience with mission-critical applications requiring high availability scalability and performance
Strong troubleshooting analytical and problem-solving skills