drjobs Sr Observability Engineer (SRE)

Sr Observability Engineer (SRE)

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Pune - India

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our Team Name software development team. This is an embedded role meaning you will be a full member of the development team working closely with software engineers infrastructure platform services engineering managers and other stakeholders to ensure the reliability scalability and performance of teams services. You will be responsible for leveraging the infrastructure tooling and processes that support our applications in dev and production as well as participating in oncall rotations. This role offers a unique opportunity to directly influence the design and architecture of our systems from a reliability and performance perspective.

Responsibilities:

Work with the developments and service owners at the intersection of development and operations to solve performance issues and ensure system scalability.

  • Reliability Engineering:Design develop and implement solutions to improve the reliability availability performance and scalability of our systems. Work with technical leaders and infrastructure platform services to develop alerts and dashboards.
  • Operational Excellence:Own and improve key operational metrics (SLIs SLOs Error Budgets monitoring and alerting) for team related services and drive continuous improvement through postincident reviews and blameless postmortems of nonfunctional issues. Develop and maintain comprehensive monitoring alerting to proactively identify and resolve issues. ConductCreate and maintain dashboards and conducting ongoing reviews to address and optimize gaps. Improve operational processes and improve operational processes and team practices working with technical leaders and NOC team.
  • Monitoring and Alerting:Develop and maintain comprehensive monitoring alerting to proactively identify and resolve issues.
  • Capacity Planning:Collaborate with technical leads DevOps/SRE and infra teams to forecast capacity needs and ensure sufficient resources are available to support growth.
  • Performance Optimization:Collaborate with performance SMEs to identify and address production performance bottlenecks through profiling tuning and optimization of services and infrastructure.
  • Automation:Automate repetitive tasks and processes to improve efficiency and reduce manual intervention.
  • Collaboration:Work closely with Software Performance and Test Engineers to influence system design and architecture for operability and reliability.
  • Documentation:Create and maintain clear and concise documentation for systems processes runbooks and procedures.
  • OnCall:Participate in oncall rotation.
  • Incident Management:Participate in oncall rotations and lead incident response efforts ensuring timely resolution and effective communication. Conduct indepth incident analysis and help drive completion of postincident action.
  • Troubleshooting skills: Excellent diagnostic and problemsolving skills with the ability to analyze complex systems and data

Qualifications:

  • Bachelors degree in computer science a related field or equivalent practical experience.
  • Proven 5 years of SRE experience
  • Strong understanding of SRE principles and practices.
  • Experience with cloud platforms (AWS GCP or Azure).
  • Proficiency in at least one scripting language (e.g. Python Bash Go).
  • Experience with monitoring and logging tools (e.g. Prometheus Grafana).
  • Level of coding experience beyond simple scripts with one of the programming languages such as Go Java or Python to help build reliability engineering
  • Experience with containerization and orchestrationtechnologies (e.g. Docker Kubernetes).
  • Understanding of network protocols and security best practices
  • Familiarity with DevOps culture and practices and experience with CI/CD toolchains
  • Experience with Incidence Response processes and config management tools (PagerDuty Git)
  • Strong problemsolving and troubleshooting skills.
  • Excellentcommunication and collaborationskills.
  • Ability to work independently and as part of a team to achieve the SRE agenda.

Preferred Qualifications:

  • Experiencewith technologiesTechnology experience with: Kafka what DBs relational databases performance tuning (JVM Go)
  • Experience with Grafana K6 Continuous Performance Tool
  • Infrastructure as Code (IaC) tools (e.g. Terraform CloudFormation Ansible).
What success looks like in the role
Within the first 30 days you will:
  • Onboard into your new role get familiar with our product offering and technology proactively meet peers and stakeholders set up your test and development environment.
  • Seek to deeply understand business problems or common engineering challenges and propose software architecture designs to solve them elegantly by abstracting useful common patterns.
By 90 days:
  • Proactively collaborate on discuss debate and refine ideas problem statements and software designs with different (sometimesmany) stakeholders architects and members of your team.
  • Take a committed approach to prototyping and coimplementing systems alongside less experienced engineers on your teamtheres no room for ivory towers here.
By 6 months:
  • Share support of critical team systems by participating in call learning the characteristics of currently running systems and participating in improvements.
  • Occasionally serve as a debugging and implementation expert during escalations of systems issues that have evaded the ability of less experienced engineers to solve in a timely manner.
  • Collaborates with Support Management and Engineering Manager to quick resolution of escalation.

SailPoint is an equal opportunity employer and we welcome all qualified candidates to apply to join our team. All qualified applicants will receive consideration for employment without regard to race color religion sex sexual orientation gender identity national origin disability protected veteran status or any other category protected by applicable law.

Alternative methods of applying for employment are available to individuals unable to submit an application through this site because of a disability. Contact or mail to 11120 Four Points Dr Suite 100 Austin TX 78726 to discuss reasonable accommodations.


Required Experience:

Senior IC

Employment Type

Full-Time

Company Industry

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.