Required Candidate Location: Hybrid/Wilmington DE 3 Days a Week - NO RELOCATION
Job Description:
As a Senior DevOps Platform Engineer you will play a critical role in ensuring the reliability scalability security and performance of Berkleys software systems. You will collaborate closely with product engineering infrastructure and architecture teams to build mature and operate an enterprise DevOps platform that enables teams to deliver software safely efficiently and at scale.
This role blends DevOps platform engineering and SRE practices with a focus on CI/CD observability automation and reliability across both cloud and on premises environments.
- Maintain a strong understanding of the entire technology stack (networking storage OS virtualization databases development frameworks and applications) to design observe troubleshoot and automate systems across the Berkley environment.
- Design build and mature enterprise CI/CD pipelines and shared DevOps platform services enabling secure reliable and scalable software delivery for multiple teams.
- Define implement and track reliability and observability OKRs including SLIs and SLOs to guide reliability engineering deployment practices and operational decision making.
- Implement and evolve monitoring alerting and observability solutions including AIOps capabilities to proactively assess system health detect anomalies enable self healing and support rapid incident response.
- Drive automation initiatives to eliminate operational toil streamline platform and pipeline workflows reduce manual intervention and improve efficiency for product engineering and SRE teams.
- Identify and address performance scalability and reliability bottlenecks across applications infrastructure and delivery pipelines to improve system efficiency and user experience.
- Partner with incident management and operations teams to respond to resolve and prevent system outages or degradation minimizing downtime and customer impact.
- Collaborate actively with development operations and platform teams to embed resiliency observability security and reliability requirements into system design CI/CD pipelines and runtime environments.
- Lead cross functional coordination with product development infrastructure and architecture teams to perform capacity planning anticipate growth and ensure systems scale reliably with business demand.
- Continuously improve platform resilience by identifying and closing gaps in architecture tooling processes and operational practices.
- Modernize and strengthen disaster recovery capabilities for both on premises and cloud based Berkley solutions ensuring recoverability resilience and compliance with enterprise standards.
Qualifications
- 8 years of experience in DevOps and Site Reliability Engineering with hands on ownership of infrastructure CI/CD platforms and software delivery in enterprise environments.
- Strong software engineering and automation skills including proficiency in Python Go Bash or JavaScript and experience building production grade automation.
- Proven expertise in enterprise CI/CD GitOps and containerized platforms including Kubernetes Helm and cloud native delivery patterns.
- Deep experience with reliability and observability including monitoring alerting logging and tracing platforms (e.g. Dynatrace Datadog ELK) and defining SLIs SLOs and reliability metrics.
- Strong understanding of cloud on prem and hybrid architectures including high availability disaster recovery capacity planning and scalability.
- Hands on experience with infrastructure as code and configuration management (e.g. Terraform Ansible GitHub Actions) to reduce operational toil and enable self service.
- Solid knowledge of security and networking fundamentals including applying industry standard security frameworks in enterprise environments.
- Demonstrated ability to lead technical initiatives influence system design decisions mentor engineers and collaborate effectively across product engineering infrastructure and security teams.
- Bachelors degree with emphasis in related field or equivalent experience.
Required Candidate Location: Hybrid/Wilmington DE 3 Days a Week - NO RELOCATION Job Description: As a Senior DevOps Platform Engineer you will play a critical role in ensuring the reliability scalability security and performance of Berkleys software systems. You will collaborate closely with product...
Required Candidate Location: Hybrid/Wilmington DE 3 Days a Week - NO RELOCATION
Job Description:
As a Senior DevOps Platform Engineer you will play a critical role in ensuring the reliability scalability security and performance of Berkleys software systems. You will collaborate closely with product engineering infrastructure and architecture teams to build mature and operate an enterprise DevOps platform that enables teams to deliver software safely efficiently and at scale.
This role blends DevOps platform engineering and SRE practices with a focus on CI/CD observability automation and reliability across both cloud and on premises environments.
- Maintain a strong understanding of the entire technology stack (networking storage OS virtualization databases development frameworks and applications) to design observe troubleshoot and automate systems across the Berkley environment.
- Design build and mature enterprise CI/CD pipelines and shared DevOps platform services enabling secure reliable and scalable software delivery for multiple teams.
- Define implement and track reliability and observability OKRs including SLIs and SLOs to guide reliability engineering deployment practices and operational decision making.
- Implement and evolve monitoring alerting and observability solutions including AIOps capabilities to proactively assess system health detect anomalies enable self healing and support rapid incident response.
- Drive automation initiatives to eliminate operational toil streamline platform and pipeline workflows reduce manual intervention and improve efficiency for product engineering and SRE teams.
- Identify and address performance scalability and reliability bottlenecks across applications infrastructure and delivery pipelines to improve system efficiency and user experience.
- Partner with incident management and operations teams to respond to resolve and prevent system outages or degradation minimizing downtime and customer impact.
- Collaborate actively with development operations and platform teams to embed resiliency observability security and reliability requirements into system design CI/CD pipelines and runtime environments.
- Lead cross functional coordination with product development infrastructure and architecture teams to perform capacity planning anticipate growth and ensure systems scale reliably with business demand.
- Continuously improve platform resilience by identifying and closing gaps in architecture tooling processes and operational practices.
- Modernize and strengthen disaster recovery capabilities for both on premises and cloud based Berkley solutions ensuring recoverability resilience and compliance with enterprise standards.
Qualifications
- 8 years of experience in DevOps and Site Reliability Engineering with hands on ownership of infrastructure CI/CD platforms and software delivery in enterprise environments.
- Strong software engineering and automation skills including proficiency in Python Go Bash or JavaScript and experience building production grade automation.
- Proven expertise in enterprise CI/CD GitOps and containerized platforms including Kubernetes Helm and cloud native delivery patterns.
- Deep experience with reliability and observability including monitoring alerting logging and tracing platforms (e.g. Dynatrace Datadog ELK) and defining SLIs SLOs and reliability metrics.
- Strong understanding of cloud on prem and hybrid architectures including high availability disaster recovery capacity planning and scalability.
- Hands on experience with infrastructure as code and configuration management (e.g. Terraform Ansible GitHub Actions) to reduce operational toil and enable self service.
- Solid knowledge of security and networking fundamentals including applying industry standard security frameworks in enterprise environments.
- Demonstrated ability to lead technical initiatives influence system design decisions mentor engineers and collaborate effectively across product engineering infrastructure and security teams.
- Bachelors degree with emphasis in related field or equivalent experience.
View more
View less