drjobs Site Reliability Engineer

Site Reliability Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Wilmington - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Company Details

Company URL:

Berkley Technology Services (BTS) is the dynamic technology solution for W. R. Berkley Corporation a Fortune 500 Commercial Lines Insurance Company. With key locations in Urbandale IA and Wilmington DE BTS provides innovative and customerfocused IT solutions to the majority of WRBCs 60 operating units across the globe. BTSs wide reach ensures that ideas and opinions are considered at every level of the organization to guarantee we find the best solutions possible.

Driven by a commitment to collaboration BTS acts as consultants to our customers and Operating Units by providing comprehensive solutions that not only address the challenge at hand but proactively plan for the Whats Next in our industry and beyond.

With a culture centered on innovation and entrepreneurial spirit BTS stands as a community of technology leaders with eyes toward the future leaders who genuinely care about growing not only their team members but themselves and take pride in their employees who shine. BTS offers endless ways to get involved and have the chance to grow your career into a wide range of roles you had never known existed. Come join us as we push forward into the future of industrys leading technological solutions.

Berkley Technology Services: Right Team Right Technology Simple and Secure.

Responsibilities

As a Site Reliability Engineer (SRE) you will play a crucial role in ensuring the reliability scalability and performance of our software systems. Collaborating closely with crossfunctional teams you will set and enforce SRE best practices ensuring the scalability reliability and security of our cloud and onpremises environments. This technically broad role requires a strong understanding of the entire technology stack (network storage OS virtualization database development applications) to observe monitor troubleshoot and automate activities within the Berkley environment.

  • Define and Track OKRs:Establish and monitor reliability and observability OKRs including Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
  • Monitoring and Alerting:Implement robust monitoring and alerting systems to proactively monitor health identify potential issues analyze system performance and facilitate quick incident response.
  • AIOps Implementation:Enable autoresponse selfhealing and anomaly trend analysis through AIOps functionality.
  • Automation Solutions:Develop and implement automation solutions to eliminate toil streamline processes reduce manual interventions and enhance overall efficiency.
  • Performance Optimization:Identify and address performance bottlenecks in applications and infrastructure to improve efficiency and user experience.
  • Incident Management:Work closely with incident management to quickly resolve system outages or performance issues minimizing downtime and user impact.
  • Collaboration:Collaborate actively with development and operations teams to implement observability and resiliency requirements for smooth software deployment and operation.
  • Reliability Improvement:Enhance reliability by identifying and addressing gaps in our architecture services and tooling.
  • Disaster Recovery:Modernize disaster recovery programs for both onpremises and cloudbased Berkley solutions.

Qualifications

  • Experience:5 years of IT experience in Development Operations and Infrastructure support; 3 years in Site Reliability Engineering and DevOps.
  • Scripting Languages:Proficiency in Python Go Bash JavaScript and Shell Scripting.
  • Observability Tools:Strong expertise in Dynatrace Datadog ELK Stack.
  • Logging and Monitoring:Practical expertise in creating and implementing logging and monitoring architectures.
  • Resiliency Solutions:Expertise in designing and implementing onpremises cloud and hybrid resiliency solutions (HA AA AP) disaster recovery and business continuity planning.
  • Cloud Computing:Deep understanding of cloud computing principles (IaaS PaaS SaaS).
  • Kubernetes:Experience with Kubernetes and autoscaling tools including Helm and Prometheus.
  • GitOps and CI/CD:Proficient in leveraging GitOps with containerization technologies and CI/CD pipelines.
  • Automation Tools:Experience with infrastructure automation and configuration management tools (GitHub Actions Terraform Ansible Chef Puppet).
  • Security Best Practices:Solid understanding of security best practices in onpremises cloud and hybrid environments.
  • Industry Standards:Understanding of industrystandard security frameworks and ability to interpret them for Berkley environments.
  • ProblemSolving:Excellent problemsolving skills and ability to troubleshoot complex issues in a distributed hybrid environment.
  • Communication:Strong communication skills to collaborate effectively with crossfunctional teams and convey technical concepts to nontechnical stakeholders.
  • Bachelors degree in Computer Science Information Technology or a related field (or equivalent experience).

The Company is an equal employment opportunity employer.

Employment Type

Unclear

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.