DevOps Developer

Prague - Czech Republic

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Location: Praha Czechia

Thales people architect identity management and data protection solutions at the heart of digital security. Business and governments rely on us to bring trust to the billons of digital interactions they have with people. Our technologies and services help banks exchange funds people cross borders energy become smarter and much more. More than 30000 organizations already rely on us to verify the identities of people and things grant access to digital services analyze vast quantities of information and encrypt data to make the connected world more secure.

Thales in the Czech Republic employs over 400 people from 45 different nationalities. A total of 15 teams work on projects for government agencies banking mobile services and the Internet Of Things (IoT) technology. At the core of our business is the development of software which we configure and embed in a multitude of different devices and form factors. These include many kinds of payment cards SIM cards travel passes secure eBanking devices authentication tokens machine identification modules (MIM) and secure ID documents including ePassports eID and eHealth cards as well as eDriving licenses. Because of the international environment surrounding us every day it comes as no surprise that English is our official corporate language.

We are looking for Site Reliability Engineer to join us at Thales and work with our Payment Solutions. The Site Reliability Engineer empowers product delivery and SRE teams to implement a holistic observability approach across AWS and GCP. We design observability standards build reusable frameworks and partner with teams to achieve end-to-end visibilityfrom and Java services to business outcomes. Our mission: make service performance measurable detect incidents proactively and accelerate investigations with trustworthy telemetry.

Day in Life of SRE:

Build and maintain observability frameworks for AWS/GCP

Create reusable Datadog instrumentation for and Java
Provide auto-instrumentation templates and enforce observability quality standards
Publish Terraform modules for Datadog resources and cloud integrations

Own Datadog dashboards and measurement standards

Define and curate source-of-truth dashboards and KPIs
Establish golden signals and semantic conventions across services
Manage observability-as-code repos in GitLab

Improve monitoring alerting and incident readiness

Design precise low-noise Datadog monitors and routing
Implement synthetics for critical flows and correlate with traces/logs
Partner with SREs on SLOs error budgets and incident triggers

Drive continuous learning and adoption

Turn post-incident learnings into improved monitors dashboards and CI/CD checks
Deliver training documentation and hands-on support for developers and SREs

Consult enable and optimize

Coach teams on instrumentation and APM best practices
Strengthen AWS/GCP observability integrations and tagging strategy
Optimize Datadog cost sampling retention and cardinality; rationalize monitors

Typical interactions:

SRE: alert quality troubleshooting SLOs post-incident reviews
Product/Dev: instrumentation trace propagation business KPIs
Platform/Infra: cloud integrations Terraform RBAC cost/performance
Security/Compliance: telemetry governance PII controls retention policies
Leadership: service health roll-ups reliability and adoption metrics

Skills & experience:

Strong engineering background in and/or Java (Datadog dd-trace async context propagation middleware patterns)
Cloud expertise in AWS serverless containers managed services and integrating cloud telemetry with Datadog
Automation skills with GitLab CI/CD and Terraform (Datadog resources modules workflows)
Datadog proficiency APM logs metrics synthetics monitors SLOs and observability-as-code practices
Observability mindset defining SLIs/SLOs improving alert quality and supporting the full incident lifecycle
Strong communication skills clear documentation training delivery and confident English communication with distributed teams

At Thales we provide CAREERS and not only jobs. With Thales employing 80000 employees in 68 countries our mobility policy enables thousands of employees each year to develop their careers at home and abroad in their existing areas of expertise or by branching out into new fields. Together we believe that embracing flexibility is a smarter way of working. Great journeys start here apply now!

Location: Praha CzechiaThales people architect identity management and data protection solutions at the heart of digital security. Business and governments rely on us to bring trust to the billons of digital interactions they have with people. Our technologies and services help banks exchange funds ...

Location: Praha Czechia

Day in Life of SRE:

Build and maintain observability frameworks for AWS/GCP

Create reusable Datadog instrumentation for and Java
Provide auto-instrumentation templates and enforce observability quality standards
Publish Terraform modules for Datadog resources and cloud integrations

Own Datadog dashboards and measurement standards

Define and curate source-of-truth dashboards and KPIs
Establish golden signals and semantic conventions across services
Manage observability-as-code repos in GitLab

Improve monitoring alerting and incident readiness

Design precise low-noise Datadog monitors and routing
Implement synthetics for critical flows and correlate with traces/logs
Partner with SREs on SLOs error budgets and incident triggers

Drive continuous learning and adoption

Turn post-incident learnings into improved monitors dashboards and CI/CD checks
Deliver training documentation and hands-on support for developers and SREs

Consult enable and optimize

Coach teams on instrumentation and APM best practices
Strengthen AWS/GCP observability integrations and tagging strategy
Optimize Datadog cost sampling retention and cardinality; rationalize monitors

Typical interactions:

SRE: alert quality troubleshooting SLOs post-incident reviews
Product/Dev: instrumentation trace propagation business KPIs
Platform/Infra: cloud integrations Terraform RBAC cost/performance
Security/Compliance: telemetry governance PII controls retention policies
Leadership: service health roll-ups reliability and adoption metrics

Skills & experience:

Strong engineering background in and/or Java (Datadog dd-trace async context propagation middleware patterns)
Cloud expertise in AWS serverless containers managed services and integrating cloud telemetry with Datadog
Automation skills with GitLab CI/CD and Terraform (Datadog resources modules workflows)
Datadog proficiency APM logs metrics synthetics monitors SLOs and observability-as-code practices
Observability mindset defining SLIs/SLOs improving alert quality and supporting the full incident lifecycle
Strong communication skills clear documentation training delivery and confident English communication with distributed teams

Key Skills

CCTV
Computer Science
Corporate Marketing
E Learning
Arabic English Translation

Apply Now

About Company

Thales

In all critical environments - air, land, sea, space and cyberspace - decision-makers, operators, crews and members of our armed services and security forces are faced with millions of important decisions every day. It is in supporting these people that Thales in the United States ha ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click