Responsibilities:
- Be part of the on-call rotation that takes care of the Site reliability up time and seamless operations.
- Monitor the alerts as per pager duty and other dashboards and escalate the incidents to the right stakeholders from Dev Devops DBA and Infosec teams.
- Update the runbook documentation tickets etc. timely with the proper root cause.
- Be the champion of the Incident and problem management at FreeCharge.
- You will set up and maintain infrastructure in AWS to host multiple software services to support product development.
- You will troubleshoot and resolve issues related to networking DNS using Route S3 and other system tools.
Requirements:
- Experience: 2-5 Years.
- Educational Qualification: Bachelor / Masters degree in CS/ME/IT.
- Ability to correlate multiple alerts to pin-point the root cause of the incident.
- Knowledge of incident and problem management in a working environment.
- Hands on experience with Unix/Linux and comfortable with coding or scripting in Python Shell or Bash Scripting for deployment and management.
- Hands on experience with AWS Core Services including VPC EC2 S3 Route 53 RDS.
- Experience with automation/configuration management using either Puppet Chef or Jenkins.
- Experience in Continuous Integration (CI)/ Continuous Delivery (CD).
- In-depth knowledge of scalability and reliability engineering.
- Managing and configuring monitoring systems using Nagios Zabbix grafana dynatrace etc.
- Hands-on Experience with Kubernetes.
- Experience with SIEM/ELK and Serverless Architecture.
- Basic working understanding of Traffic request flow(Front-end and Back-end).
- Comfort with collaboration open communication and reaching across functional borders.
cd,elk,shell scripting,ci,kubernetes,sre,continuous integration,s3,puppet,chef,siem,zabbix,python,nagios,dynatrace,monitoring,aws core services,serverless architecture,bash scripting,aws,unix/linux,jenkins,grafana,ec2,continuous delivery