DevOps Production Support Engineer
Job Summary
For this project you will be responsible for ensuring the stability availability and performance of applications within the teams scope. You will play a key role in incident management ensuring timely resolution by collaborating with internal teams (Development and Infrastructure) and external stakeholders (Service Providers) while driving sustainable long-term solutions.
Key Responsibilities
Application Stability & Availability
- Monitor maintain and support in-scope applications to ensure high availability reliability and performance.
- Actively participate in incident management activities including:
- Situation Rooms for P1/P2 incidents
- Root Cause Analysis (RCA)
- Identification of incident trends and contribution to permanent solutions
- Ensure compliance with ITIL governance within IT Production including SLA management.
- Execute change requests and deployments in accordance with ITIL and DevOps processes and tools.
- Proactively identify and resolve technical issues to ensure smooth business operations.
- Participate in on-call rotations and provide 24/7 support for critical applications when required.
Technical Support & Cross-Team Collaboration
- Serve as a primary point of contact for Development teams supporting troubleshooting activities and coordinating fixes.
- Work closely with Scrum and Agile teams to design deploy and continuously improve systems.
- Implement upgrades patches and new functionalities while ensuring minimal impact on end users.
Platform Monitoring & Observability
- Implement configure and optimize monitoring solutions within the production environment (e.g. Dynatrace).
- Collaborate with Development teams and Centers of Expertise to define effective monitoring and observability practices.
- Promote observability awareness to enable early detection and proactive resolution of potential issues.
- Utilize distributed tracing logging and metrics tools (e.g. Jaeger Grafana Prometheus ELK).
Documentation & Knowledge Sharing
- Create maintain and update technical documentation including processes configurations and troubleshooting guides.
- Share best practices and technical knowledge with global support teams to improve service quality and operational efficiency.
Qualifications :
API Application Servers & Kubernetes
- Strong experience with Java application servers particularly Red Hat JBoss EAP.
- Solid Java knowledge including:
- Heap and thread dump analysis
- Performance tuning and optimization
- Experience with OpenShift and Kubernetes-based platforms including cloud-native environments.
- API Gateway integration and support.
- Strong knowledge of RHEL Linux operating systems.
Monitoring Automation & DevOps
- Hands-on experience with observability and monitoring tools such as:
- Dynatrace
- Jaeger
- Grafana
- Prometheus
- ELK Stack
- Experience setting up and optimizing CI/CD pipelines using tools such as:
- GitLab
- ArgoCD
- Jenkins
- Nexus Sonatype
- Experience with infrastructure and automation tools such as Ansible and/or Terraform.
Soft Skills
- Strong problem-solving and critical-thinking abilities.
- Excellent collaboration and teamwork skills.
- Clear and effective communication skills.
- Resilience and adaptability in fast-paced environments.
- Ability to manage stress and perform effectively during critical incidents.
- Strong sense of accountability ownership and autonomy.
- Effective time management and prioritization skills.
- Goal-oriented mindset with strong attention to detail.
Language Skills
- Fluent in Portuguese and English.
Remote Work :
No
Employment Type :
Full-time
About Company
Inetum is a European leader in digital services. Inetums team of 28,000 consultants and specialists strive every day to make a digital impact for businesses, public sector entities and society. Inetums solutions aim at contributing to its clients performance and innovation as well ... View more