drjobs Associate Site Reliability Engineer

Associate Site Reliability Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Hyderabad - India

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

About the Role:

We are seeking a highly skilled Associate Site Reliability Engineer (SRE) to join our team. As an SRE you will play a pivotal role in ensuring the reliability scalability and performance of our cloud-based infrastructure. You will collaborate closely with development operations and other teams to implement and maintain efficient and resilient systems.

We are the SRE Frontline Team of CyberArk. Our group ensures the health and performance of system and services is optimal using monitoring tools and dashboards. Our goal is to maintain a scalable fault-tolerant high-load distributed system. We are searching for an outstanding SRE expert who is responsible for driving and improving the Incident Management processes and goals for Site Reliability teams with a focus on triaging and ensuring the reliability performance and scalability of CyberArks SaaS services and underlying AWS infrastructure. This role involves a combination of technical expertise documentation and collaboration to meet the organizations reliability and availability goals. 

Responsibilities:

  • Incident Management Monitoring and Alerting: Drive incident response processes and troubleshoot complex issues ensuring timely resolution of outages. Establish monitoring logging and alerting best practices using tools like Datadog Site24x7 etc
  • Tooling and Automation:Build essential tooling to improve reliability of systems and automated remediation of issues.  
  • Be a part of the on-call rotation 365x24x7. 
  • SOP Documentation: Create and maintain documentation for infrastructure processes and incident management protocols.
  • Understanding of Infrastructure as Code (IaC) tools such as Terraform and Ansible to automate the provisioning configuration and deployment processes.
  • Attend all training programs and complete all tasks set by the supervisor and assist other trainees wherever possible. 
  • Cloud Platform Expertise: Hands-on with AWS cloud services including EC2 S3 VPC RDS EKS ECS CF and more.
  • CI/CD Pipelines: Fair understanding of CI/CD pipelines using tools like Jenkins.
  • Monitoring and Alerting: Hands-on experience with monitoring and alerting tools like ELK Datadog CloudWatch Grafana etc to proactively identify and resolve issues.
  • Performance Tuning: Continuously optimize system performance identify bottlenecks and implement strategies to improve scalability and efficiency.
  • Cost Optimization: Identify and implement strategies to reduce cloud costs while maintaining performance and reliability.
  • Security Best Practices: Adhere to security best practices and implement measures to protect infrastructure and data from vulnerabilities and threats.
  • Collaboration and Communication: Work effectively with cross-functional teams to understand business requirements and provide technical guidance.

#IL-MP01


Qualifications :

Required Skills and Experience:

 

  • 2-3 years of experience as a Site Reliability
  • Strong proficiency in AWS cloud services like EC2 S3 VPC RDS EKS ECS CloudFormation and more. AWS Certification helps.
  • Good Logical Analytical and Problem-solving skills. 
  • Strong communication skills and Ability to work in shifts (24x7).
  • Strong scripting skills (Python PowerShell CDK Shell scripting).
  • Understanding of infrastructure as code tools (Terraform Ansible) and AWX Tower for Ansible automation.
  • Knowledge of containerization (Docker) and orchestration platforms (Kubernetes).
  • Expertise in CI/CD pipelines and automation tools (Jenkins GitHub).
  • Exposure to monitoring and alerting tools (CloudWatch Datadog ELK Grafana Site24x7).
  • Documenting SOP and RCAs.
  • Understanding of security best practices and compliance standards. Security Certification is a plus.


Remote Work :

No


Employment Type :

Full-time

Employment Type

Full-time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.