Site Reliability Engineer (FLEEM)
Posted on:
10 days ago
Vacancies:
1 Vacancy
Job Summary
Summary:
- Implements and operates monitoring logging alerting and dashboards.
- Works closely with developers on build & run topics.
- Monitors and coordinates security-related topics (vulnerabilities findings) and stability-related topics (query patterns indexing).
- First responder for production incidents.
Mandatory Skills (in order of importance):
- Monitoring & alerting (CloudWatch logs metrics dashboards alarms)
- Distributed tracing (AWS X-Ray Lambda Insights)
- Incident management (root-cause analysis runbook authoring)
- Database performance analysis (MongoDB PostgreSQL)
- Security operations ()
- AWS services (Lambda S3 SQS IAM VPC)
- Bash / shell scripting & automation
Advantageous Skills:
- GitHub Actions AWS CodePipeline (CI/CD understanding)
- Docker ECS Fargate (container health troubleshooting)
- AWS Cost Explorer resource tagging
- Release coordination (hotfix processes rollback procedures)
- BMW (Integrate) platform operational knowledge
- TypeScript (reading application code for
Required Experience:
Manager