- Site Reliability Engineer
- Vienna VA ( hybrid 4x / week onsite Fridays are remote )
- Must obtain a Public Trust clearance
Top Skills: Splunk
Description:
Seeking a Site Reliability Engineers (SRE) to join our SRE Support team. As an SRE at Centennial you will play a vital role in ensuring the 24x7 monitoring and production support of critical systems. Our team is responsible for meeting service level agreements (SLAs) and following SRE best practices to minimize manual remediation (toil) to less than 50% of your workload. Your primary focus will be on building automated remediation capabilities to enhance system reliability. You will collaborate with the customer Cloud Architects and DevOps Engineers to control and increase Reliability.
Key Responsibilities:
- Provide 24x7 monitoring and production support to ensure system availability.
- Meet defined SLAs and service levels in alignment with SRE best practices.
- Minimize manual remediation (toil) by developing and implementing automated remediation solutions.
- Collaborate with appropriate teams in the event of system overload including Application and cloud automation teams.
- Administer/Configure Splunk.
- Perform application monitoring gradual change implementation and automation for reliability improvement.
- Contribute to Business Continuity and Disaster Recovery (DR) efforts particularly in cloud-based business continuity.
- Assist in designing Reliability Maintainability and Availability (RAM/ARM) for Systems through Fault Tolerance Redundancy Distributed/Parallel Processing and five 9s (i.e. 99.999%).
- Perform Business Continuity Continuity of Operations (COOP) DR and Readiness planning exercises and testing.
- Perform Switchover/Failover with Cold Warm or Hot Start.
- Monitor/remedy System Data Synchronization processes.
- Administer Splunk - Platform performance and stability resource usage/infrastructure monitoring.
Requirements:
- Bachelors degree in computer science Information Technology or a related field and 6 years in SRE.
- Proven experience as a Site Reliability Engineer (SRE).
- Strong knowledge of cloud-based Business Continuity COOP DR and Readiness planning exercises and testing.
- Proficiency in Splunk administration and configuration.
- Ability to work collaboratively and efficiently in a team.
- Exceptional problem-solving and troubleshooting skills.
- Excellent communication and documentation skills.
Site Reliability Engineer Vienna VA ( hybrid 4x / week onsite Fridays are remote ) Must obtain a Public Trust clearance Top Skills: Splunk Description: Seeking a Site Reliability Engineers (SRE) to join our SRE Support team. As an SRE at Centennial you will play a vital role in ensuring...
- Site Reliability Engineer
- Vienna VA ( hybrid 4x / week onsite Fridays are remote )
- Must obtain a Public Trust clearance
Top Skills: Splunk
Description:
Seeking a Site Reliability Engineers (SRE) to join our SRE Support team. As an SRE at Centennial you will play a vital role in ensuring the 24x7 monitoring and production support of critical systems. Our team is responsible for meeting service level agreements (SLAs) and following SRE best practices to minimize manual remediation (toil) to less than 50% of your workload. Your primary focus will be on building automated remediation capabilities to enhance system reliability. You will collaborate with the customer Cloud Architects and DevOps Engineers to control and increase Reliability.
Key Responsibilities:
- Provide 24x7 monitoring and production support to ensure system availability.
- Meet defined SLAs and service levels in alignment with SRE best practices.
- Minimize manual remediation (toil) by developing and implementing automated remediation solutions.
- Collaborate with appropriate teams in the event of system overload including Application and cloud automation teams.
- Administer/Configure Splunk.
- Perform application monitoring gradual change implementation and automation for reliability improvement.
- Contribute to Business Continuity and Disaster Recovery (DR) efforts particularly in cloud-based business continuity.
- Assist in designing Reliability Maintainability and Availability (RAM/ARM) for Systems through Fault Tolerance Redundancy Distributed/Parallel Processing and five 9s (i.e. 99.999%).
- Perform Business Continuity Continuity of Operations (COOP) DR and Readiness planning exercises and testing.
- Perform Switchover/Failover with Cold Warm or Hot Start.
- Monitor/remedy System Data Synchronization processes.
- Administer Splunk - Platform performance and stability resource usage/infrastructure monitoring.
Requirements:
- Bachelors degree in computer science Information Technology or a related field and 6 years in SRE.
- Proven experience as a Site Reliability Engineer (SRE).
- Strong knowledge of cloud-based Business Continuity COOP DR and Readiness planning exercises and testing.
- Proficiency in Splunk administration and configuration.
- Ability to work collaboratively and efficiently in a team.
- Exceptional problem-solving and troubleshooting skills.
- Excellent communication and documentation skills.
View more
View less