cFocus Software seeks a Site Reliability Engineer to join our program supporting the United States Secret Services (USSS). This position is remote. This position requires the ability a TS/SCI clearance.Qualifications:- Bachelors degree in Computer Science Engineering or related technical field (or equivalent experience).
- Minimum of 2 years of experience in systems engineering DevOps or Site Reliability Engineering roles.
- Strong proficiency with Linux/Unix operating systems.
- Experience with scripting and automation using Python Bash or similar languages.
- Experience with monitoring and logging tools such as Prometheus Grafana ELK Stack or equivalent.
- Experience supporting CI/CD tools such as GitLab Jenkins or ArgoCD.
- Experience with containerization and orchestration platforms including Docker and Kubernetes.
- Understanding of SRE principles including SLIs SLOs and error budgets.
- Strong troubleshooting problem-solving and documentation skills.
Duties:- Monitor system health availability and performance using centralized monitoring and logging tools.
- Respond to troubleshoot and resolve incidents in production environments and provide root cause analysis.
- Conduct after-action reporting and post-incident reviews to improve system resilience.
- Automate repetitive operational tasks including deployments monitoring and incident response.
- Administer user accounts access controls and authentication mechanisms.
- Maintain and configure workflow templates user fields and application configurations.
- Maintain test environments that mirror production and support pre-deployment testing.
- Design and maintain backup high availability (HA) and disaster recovery (DR) solutions.
- Develop and maintain incident response and disaster recovery plans for supported applications.
- Configure and support integrations with complementary enterprise systems.
- Architect build and maintain on-premise and cloud infrastructure supporting applications.
- Administer production staging and development environments.
- Manage system logs and monitor for security and operational events.
- Maintain and improve CI/CD pipelines and DevSecOps processes.
- Apply configuration management disciplines including patching hardening and documentation.
- Create and maintain dashboards SLIs SLOs and service health metrics.
- Support operational readiness boards and weekly service reviews.
- Provide on-call support for outages upgrades and emergency maintenance as required.
- Support surge activities including Presidential Transition-related data analysis if required.
Required Experience:
Senior IC
cFocus Software seeks a Site Reliability Engineer to join our program supporting the United States Secret Services (USSS). This position is remote. This position requires the ability a TS/SCI clearance.Qualifications:Bachelors degree in Computer Science Engineering or related technical field (or equ...
cFocus Software seeks a Site Reliability Engineer to join our program supporting the United States Secret Services (USSS). This position is remote. This position requires the ability a TS/SCI clearance.Qualifications:- Bachelors degree in Computer Science Engineering or related technical field (or equivalent experience).
- Minimum of 2 years of experience in systems engineering DevOps or Site Reliability Engineering roles.
- Strong proficiency with Linux/Unix operating systems.
- Experience with scripting and automation using Python Bash or similar languages.
- Experience with monitoring and logging tools such as Prometheus Grafana ELK Stack or equivalent.
- Experience supporting CI/CD tools such as GitLab Jenkins or ArgoCD.
- Experience with containerization and orchestration platforms including Docker and Kubernetes.
- Understanding of SRE principles including SLIs SLOs and error budgets.
- Strong troubleshooting problem-solving and documentation skills.
Duties:- Monitor system health availability and performance using centralized monitoring and logging tools.
- Respond to troubleshoot and resolve incidents in production environments and provide root cause analysis.
- Conduct after-action reporting and post-incident reviews to improve system resilience.
- Automate repetitive operational tasks including deployments monitoring and incident response.
- Administer user accounts access controls and authentication mechanisms.
- Maintain and configure workflow templates user fields and application configurations.
- Maintain test environments that mirror production and support pre-deployment testing.
- Design and maintain backup high availability (HA) and disaster recovery (DR) solutions.
- Develop and maintain incident response and disaster recovery plans for supported applications.
- Configure and support integrations with complementary enterprise systems.
- Architect build and maintain on-premise and cloud infrastructure supporting applications.
- Administer production staging and development environments.
- Manage system logs and monitor for security and operational events.
- Maintain and improve CI/CD pipelines and DevSecOps processes.
- Apply configuration management disciplines including patching hardening and documentation.
- Create and maintain dashboards SLIs SLOs and service health metrics.
- Support operational readiness boards and weekly service reviews.
- Provide on-call support for outages upgrades and emergency maintenance as required.
- Support surge activities including Presidential Transition-related data analysis if required.
Required Experience:
Senior IC
View more
View less