Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailSenior Site Reliability Engineer observability is needed for a global pioneer in Cloud and Internet Intelligence. They are giving organizations visibility and insight into a borderless network. Arming their clients with a precise understanding of how the network impacts their applications users and customers.
This role will be a unique opportunity for an experienced SRE to provide the tools services and infrastructure to monitor and observe the Platform. Leveraging cloud native tools and enabling the developers to instrument analyse and monitor the application.
Permanent position Hybrid in London.
Responsibilities
Responsibilities involve designing deploying and maintaining cloudnative monitoring services that are both elastic and resilient to failure across AWS. It is also fundamental to establish standards and best practices for the instrumentation of containerbased services and cloudmanaged services. The maintenance of their pipeline is key to ensure that notifications are welltimed accurate and directed to the appropriate channels. Automation is a priority as it allows the monitoring platforms to scale smoothly promoting a selfservice approach.
Requirements
Strong Infrastructure as Code skills ideally with Terraform and Kubernetes.
Strong knowledge of modern logging tool sets including Logstash or Fluentd.
Understanding of Prometheus and its ecosystem including Alertmanager.
Good knowledge of Application Performance Monitoring tools and crash reporting tools such as Sentry.
Good knowledge of cloud provider managed services and how they can be leveraged in our context.
Ability to write high quality code in Python Go or equivalent languages.
This is an exciting opportunity for a Senior SRE to join an expanding global business. If you are interested please apply with your CV.
Required Experience:
Senior IC
Full Time