We are looking for a Senior Site Reliability Engineer to join our Bluebeam team.
YOUR DAILY CHALLENGES
- Design and evolve reliable secure and cost-efficient AWS architectures including EKS Lambda EC2 RDS/Aurora DynamoDB S3 MSK/Kinesis and OpenSearch;
- Build and mature observability practices: define SLIs/SLOs standardise metrics/traces/logs create dashboards and implement proactive alerting strategies;
- Develop reliability tooling and automation using Go contribute performance improvements and create operational runbooks with development teams;
- Implement and maintain Terraform for multi-account AWS environments with reusable modules policy-as-code and infrastructure drift detection;
- Improve CI/CD pipelines in GitHub/GitLab enabling blue/green and canary deployments with automated quality gates and safe rollback mechanisms;
- Lead incident response including triage root cause analysis blameless postmortems and implementing preventive fixes to reduce MTTR;
- Coach and mentor engineers on observability alerting incident response and reliability automation to raise team operational maturity.
OUR EXPECTATIONS
- 7 years of professional experience in SRE Platform Engineering Production Engineering or DevOps roles;
- Hands-on experience with or any other programming language for building platform automation services and reliability tooling;
- Strong experience with Kubernetes (EKS) serverless (Lambda) container orchestration and progressive delivery strategies;
- Strong experience with Terraform in production with multi-account patterns and infrastructure as code best practices;
- Experience with designing and maintaining CI/CD pipelines (GitHub/GitLab) with secure release practices and progressive deployment;
- Proven experience monitoring and improving API reliability at scale including tracking SLIs/SLOs error rates latency and throughput;
- Strong incident management skills focused on MTTR reduction and preventive engineering;
- Excellent communication collaboration and mentoring abilities;
- Excellent English proficiency (written and verbal).
CONSIDERED A PLUS
- Experience with observability platforms (Grafana Prometheus Datadog New Relic CloudWatch) for metrics traces logs and distributed tracing;
- Experience with API management platforms (Mulesoft Apigee Kong AWS API Gateway) including API versioning rate limiting and SLA enforcement;
- Knowledge of theKubernetes ecosystem (operators admission controllers service mesh) and container security;
- Understanding of security operations (IAM least privilege WAF Shield) and compliance frameworks (SOC2 FedRAMP).
WHAT YOU WILL GET
- Drive reliability and performance for systems that enable millions of construction professionals to advance the way the world is built;
- Hybrid work model;
- Access to conferences training programs and self-learning platforms;
- Supportive environment where your ideas matter and technical excellence is valued;
- Shape your role and grow within a 450 person organisation with 25 years of software excellence;
- Diverse internal events and activities to build relationships across teams;
- Comprehensive benefits and financial compensation.
Were looking for people with creative minds and enthusiasm to join us in developing whats new whats next and what best serves our customers needs.
Ready to make an impact in software development Wed be happy to welcome you to our team.
Required Experience:
Senior IC
We are looking for a Senior Site Reliability Engineer to join our Bluebeam team.YOUR DAILY CHALLENGESDesign and evolve reliable secure and cost-efficient AWS architectures including EKS Lambda EC2 RDS/Aurora DynamoDB S3 MSK/Kinesis and OpenSearch;Build and mature observability practices: define SLIs...
We are looking for a Senior Site Reliability Engineer to join our Bluebeam team.
YOUR DAILY CHALLENGES
- Design and evolve reliable secure and cost-efficient AWS architectures including EKS Lambda EC2 RDS/Aurora DynamoDB S3 MSK/Kinesis and OpenSearch;
- Build and mature observability practices: define SLIs/SLOs standardise metrics/traces/logs create dashboards and implement proactive alerting strategies;
- Develop reliability tooling and automation using Go contribute performance improvements and create operational runbooks with development teams;
- Implement and maintain Terraform for multi-account AWS environments with reusable modules policy-as-code and infrastructure drift detection;
- Improve CI/CD pipelines in GitHub/GitLab enabling blue/green and canary deployments with automated quality gates and safe rollback mechanisms;
- Lead incident response including triage root cause analysis blameless postmortems and implementing preventive fixes to reduce MTTR;
- Coach and mentor engineers on observability alerting incident response and reliability automation to raise team operational maturity.
OUR EXPECTATIONS
- 7 years of professional experience in SRE Platform Engineering Production Engineering or DevOps roles;
- Hands-on experience with or any other programming language for building platform automation services and reliability tooling;
- Strong experience with Kubernetes (EKS) serverless (Lambda) container orchestration and progressive delivery strategies;
- Strong experience with Terraform in production with multi-account patterns and infrastructure as code best practices;
- Experience with designing and maintaining CI/CD pipelines (GitHub/GitLab) with secure release practices and progressive deployment;
- Proven experience monitoring and improving API reliability at scale including tracking SLIs/SLOs error rates latency and throughput;
- Strong incident management skills focused on MTTR reduction and preventive engineering;
- Excellent communication collaboration and mentoring abilities;
- Excellent English proficiency (written and verbal).
CONSIDERED A PLUS
- Experience with observability platforms (Grafana Prometheus Datadog New Relic CloudWatch) for metrics traces logs and distributed tracing;
- Experience with API management platforms (Mulesoft Apigee Kong AWS API Gateway) including API versioning rate limiting and SLA enforcement;
- Knowledge of theKubernetes ecosystem (operators admission controllers service mesh) and container security;
- Understanding of security operations (IAM least privilege WAF Shield) and compliance frameworks (SOC2 FedRAMP).
WHAT YOU WILL GET
- Drive reliability and performance for systems that enable millions of construction professionals to advance the way the world is built;
- Hybrid work model;
- Access to conferences training programs and self-learning platforms;
- Supportive environment where your ideas matter and technical excellence is valued;
- Shape your role and grow within a 450 person organisation with 25 years of software excellence;
- Diverse internal events and activities to build relationships across teams;
- Comprehensive benefits and financial compensation.
Were looking for people with creative minds and enthusiasm to join us in developing whats new whats next and what best serves our customers needs.
Ready to make an impact in software development Wed be happy to welcome you to our team.
Required Experience:
Senior IC
View more
View less