drjobs Senior Site Reliability Engineer – AWS & Kubernetes

Senior Site Reliability Engineer – AWS & Kubernetes

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Bengaluru - India

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Join us as we work to create a thriving ecosystem that delivers accessible high-quality and sustainable healthcare for all.

athenahealth is a progressive innovation-driven software product company. We partner with healthcare organizations across the care continuum to drive clinical and financial results. Our expert teams build modern technology on an open connected ecosystem yielding insights that make a difference for our customers and their patients. We maintain a unique values-driven employee culture and offer a flexible work-life balance. As evidence of our rapid growth and industry leadership we were acquired by the worlds leading private equity firm Bain Capital in 2021 for $17bn! and we have many new strategic product initiatives.

We are headquartered in Boston and our other offices are located in Atlanta Austin Belfast and Burlington. In India we have offices in Bangalore Pune and Chennai.

Position Summary: We are looking for a Senior Site Reliability Engineer SMTS to join our Cloud Infrastructure Engineering division in Bangalore. Cloud Infrastructure Engineering ensures the continuous availability of the technologies and systems that are the foundation of athenahealths services. We are directly responsible for thousands of servers petabytes of storage and handling thousands of web requests per second all while sustaining growth at a meteoric rate. We enable an operating system for the medical office that abstracts away administrative complexity leaving doctors free to practice medicine.

The Team:We are a bunch of Site Reliability Engineers who are passionate about reliability automation and scalability. We use an agile based framework to execute our work ensuring we are always focused on the most important and impactful needs of the business. We support systems in both private and public cloud and make data-driven decisions for which one best suit the needs of the business. We are relentless in automating away manual repetitive work so we can focus on projects that help move the business forward.

Job Responsibilities:

Reliability and Availability:

  • Define measure and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for cloud services and infrastructure components.
  • Lead efforts to continuously improve system availability fault tolerance and disaster recovery capabilities.
  • Ensure proactive incident detection efficient root cause analysis and timely resolution of production incidents
  • On-Call participation in 24x7 setup.

Automation and Infrastructure as Code (IaC):

  • Drive automation efforts to reduce manual intervention and streamline cloud infrastructure management.
  • Implement Infrastructure as Code (IaC) using tools like Terraform AWS CloudFormation and Ansible to provision manage and scale cloud resources.
  • Automate deployment scaling and monitoring processes to improve efficiency and reduce operational complexity.

Monitoring Observability and Performance Tuning:

  • Design and implement monitoring logging and alerting solutions to track cloud infrastructure health performance and security.
  • Use observability tools (e.g. Prometheus Grafana Cloud Watch) to ensure continuous visibility into cloud infrastructure performance and capacity.
  • Identify bottlenecks and performance issues proposing and implementing improvements to ensure optimal resource usage.

Security and Compliance:

  • Ensure that cloud infrastructure is built with security best practices in mind and meets all relevant compliance and regulatory requirements.
  • Collaborate with security teams to implement security controls and risk mitigation strategies across cloud environments.
  • Regularly audit and review cloud infrastructure for security vulnerabilities and compliance gaps.

Collaboration and Cross-Functional Leadership:

  • Work closely with development DevOps and operations teams to ensure cloud infrastructure aligns with application and business requirements.
  • Lead and mentor a team of Site Reliability Engineers promoting best practices and fostering a culture of operational excellence.
  • Act as a key technical point of contact for cloud-related infrastructure and operations issues.

Incident Management and Post-Mortem:

  • Lead the incident response efforts for cloud infrastructure-related issues ensuring that all incidents are managed effectively.
  • Conduct post-incident reviews (PIRs) to identify root causes and implement preventive measures.
  • Continuously refine incident management processes to reduce downtime and enhance recovery times.

Qualifications

  • 5-9 years of hands-on experience with cloud automation and configuration management tools (e.g. Terraform AWS CloudFormation Ansible). On a Hybrid Cloud Set-up.
  • 5 years of experience in a Site Reliability Engineering (SRE) Infrastructure Engineering or DevOps role with at least 3 years in a technical leadership capacity.
  • Deep knowledge of cloud services and technologies (e.g. EC2 S3 Lambda Kubernetes etc.).
  • Proficiency in scripting or programming languages (Python Go Bash etc.).
  • Experience with monitoring logging and observability tools (e.g. Prometheus Grafana Datadog ELK stack).
  • Familiarity with Continuous Integration/Continuous Deployment (CI/CD) pipelines and cloud-native development practices.
  • Strong expertise in managing cloud infrastructure (AWS Google Cloud Azure) in production environments.
  • Experience with cloud-native architectures microservices and containerized environments (Kubernetes Docker).
  • Proven experience in building and managing highly available scalable and fault-tolerant systems in the cloud.
  • Strong understanding of cloud networking storage compute services On-Prem and security best practices.

About athenahealth

Our vision: In an industry that becomes more complex by the day we stand for simplicity. We offer IT solutions and expert services that eliminate the daily hurdles preventing healthcare providers from focusing entirely on their patients powered by our vision to create a thriving ecosystem that delivers accessible high-quality and sustainable healthcare for all.

Our company culture: Our talentedemployees or athenistas as we call ourselves spark the innovation and passion needed to accomplish our vision. We are a diverse group of dreamers and do-ers with unique knowledge expertise backgrounds and perspectives. We unite as mission-driven problem-solvers with a deep desire to achieve our vision and make our time here count. Our award-winning culture is built around shared values of inclusiveness accountability and support.

Our DEI commitment: Our vision of accessible high-quality and sustainable healthcare for all requires addressing the inequities that stand in the way. Thats one reason we prioritize diversity equity and inclusion in every aspect of our business from attracting and sustaining a diverse workforce to maintaining an inclusive environment for athenistas our partners customers and the communities where we work and serve.

What we can do for you:

Along with health and financial benefits athenistas enjoy perks specific to each location including commuter support employee assistance programs tuition assistance employee resource groups and collaborativeworkspaces some offices even welcome dogs.

We also encourage a better work-life balance for athenistas with our flexibility. While we know in-office collaboration is critical to our vision we recognize that not all work needs to be done within an office environmentfull-time. With consistent communication and digital collaboration tools athenahealthenablesemployees to find a balance that feels fulfilling and productive for each individual situation.

In addition to our traditional benefits and perks we sponsor events throughout the year including book clubs external speakers and hackathons. We provide athenistas with a company culture based on learning the support of an engaged team and an inclusive environment where all employees are valued.

Learn more about our culture and benefits here:

Experience:

Senior IC

Employment Type

Full-Time

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.