Site Reliability Engineer

N-iX

Not Interested
Bookmark
Report This Job

profile Job Location:

Bucharest - Romania

profile Monthly Salary: Not Disclosed
Posted on: 12 hours ago
Vacancies: 1 Vacancy

Job Summary

We are seeking experiencedSite Reliability Engineers (SREs) to help monitor maintain and scale software production environments with a primary focus ononboarding new microservices.

You will work closely with development and platform teams to automate and programmanage the onboarding lifecyclefrom initial requirements and environment setup through deployment testing documentation and handoverensuring reliability scalability performance and compliance at every step.

Key Responsibilities

1. Service Onboarding & Automation

  • Lead and support theend-to-end onboarding processfor new microservices into production environments.
  • Identify and automate gaps in the current onboarding workflow (deployment configuration monitoring scaling etc.).
  • Provideprogram managementfor onboarding activities including timelines dependencies and stakeholder communication.
  • Collaborate with development and operations/platform teams to ensure smooth and consistent rollout of new services.

2. Monitoring Logging & Observability

  • Design and implementmonitoring logging and alertingfor all onboarded services.
  • Ensure comprehensive metrics collection (e.g. availability latency error rates throughput) to support SLOs/SLIs.
  • Tune alerts to minimize noise while ensuring rapid detection and response to production issues.

3. Scalability Load & Performance

  • Performload and stress testingto validate that services can scale to meet current and projected demand.
  • Implement and refineautoscaling mechanismsand capacity planning practices.
  • Conduct ongoingperformance tuning and optimizationto achieve minimal latency and high throughput.

4. Reliability Resilience & Uptime

  • Drive highservice reliability and uptimefor all onboarded microservices.
  • Help teams design and implementfaulttolerant architectures including failover and redundancy mechanisms.
  • Work with teams to adopt SRE best practices (e.g. error budgets postincident reviews runbooks).

5. Security & Compliance

  • Ensure all onboarded servicesmeet security and compliance requirements.
  • Integrate security best practices into deployment monitoring and operational processes.
  • Maintainaudit trailsand documentation for onboarding activities to support regulatory and internal compliance.

6. Documentation Training & Knowledge Transfer

  • Createdetailed documentationfor the service onboarding process including standards patterns and templates.
  • Develop and maintainrunbooks playbooks and SOPsfor ongoing operations.
  • Conducttraining sessions and workshopsfor internal teams to enable selfservice onboarding and longterm maintainability.

7. Planning Testing & PostOnboarding Support

  • Participate inrequirements analysisfor new services; define onboarding success criteria and KPIs.
  • Developonboarding plansoutlining steps timelines responsibilities and acceptance criteria; present plans to stakeholders for review and approval.
  • Prepare and validate environments ensuring appropriate access permissions and tooling are in place.
  • Conduct comprehensivefunctional performance reliability and security testingprior to golive.
  • Providepostonboarding support monitoring services to ensure continued reliability and quickly addressing any issues that arise.

Required Qualifications

  • Proven experience as aSite Reliability Engineer DevOps Engineer or similar role inmicroservices-basedenvironments.
  • Strong understanding ofmicroservices architecture distributed systems and cloudnative concepts.
  • Hands-on experience with:
    • Productionmonitoring logging and alerting(e.g. metrics tracing log aggregation tools).
    • Automationof deployment and operational workflows (e.g. scripts pipelines IaC or similar).
    • Load/performance testingand capacity planning.
  • Demonstrated ability to improveservice reliability scalability and performancein production.
  • Familiarity withsecurity best practicesrelated to service deployment monitoring and operations.
  • Experience working acrosscrossfunctional teams(development operations security compliance) to deliver complex initiatives.
  • Excellentdocumentation communication and stakeholder management skills.

Preferred Qualifications

  • Experience defining and trackingSRE KPIs/SLOs/SLIsfor onboarding and production services.
  • Background inprogram or project managementof technical initiatives (especially service onboarding or platform rollouts).
  • Prior experience inhighavailability regulated or largescale SaaS environments.

We offer*:

  • Flexible working format - remote office-based or flexible
  • A competitive salary and good compensation package
  • Personalized career growth
  • Professional development tools (mentorship program tech talks and trainings centers of excellence and more)
  • Active tech communities with regular knowledge sharing
  • Education reimbursement
  • Memorable anniversary presents
  • Corporate events and team buildings
  • Other location-specific benefits

*not applicable for freelancers


Required Experience:

IC

We are seeking experiencedSite Reliability Engineers (SREs) to help monitor maintain and scale software production environments with a primary focus ononboarding new microservices.You will work closely with development and platform teams to automate and programmanage the onboarding lifecyclefrom ini...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting

About Company

Company Logo

N-iX is a global software development company that helps world’s leading organizations achieve lasting business value using advanced technology.

View Profile View Profile