PointClickCare is a leading North American healthcare technology platform enabling meaningful care collaboration and realtime patient insights. For over 20 years the company has been focused on realizing its vision: to help create a world in which providers and plans can confidently deliver frictionless care. Since its inception PointClickCare has grown exponentially with over 2200 employees working to impact millions across North America. Recognized by Forbes as one of the top 100 private cloud companies and acknowledged by Waterstone Human Capital as Canadas Most Admired Corporate Cultures PointClickCare leads the way in creating cloudbased healthcare software.
At PointClickCare we offer a wealth of opportunities and a vibrant culture that empowers our employees. Our dynamic environment is the perfect place to advance your career while engaging in meaningful work alongside incredible colleagues. Here youll discover a space where your talents can thrive your career can grow and your work will have a lasting impact on healthcare across North America. We believe that work becomes profoundly fulfilling when driven by a higher purpose.
Join us and be part of a team that is making a real impact.
Site Reliability Engineer (SRE) Responsibilities:
Lead and implement SRE best practices to foster a strong SRE culture.
Coach and mentor intermediary staff to develop their skills and grow into SRE.
Lead incident response calls and troubleshoot system and applicationlevel issues.
Lead RCAs to capture lessons learned and implement innovative solutions to prevent future incidents.
Contribute technically to a team focused on applying Software Engineering Practices to operations at scale.
Implement monitor and report on Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for application services collaborating with business and product owners to establish key performance indicators (KPIs).
Participate in technical training events game day scenarios and engineering spikes.
Provide support for a wide range of applications with a focus on increasing automation repeatability and consistency.
Create and maintain monitoring technologies and processes to enhance visibility into application performance and business metrics ensuring manageable operational workload.
Proactively improve application and infrastructure resiliency under various error and performance conditions.
Collaborate with security Engineers to develop plans and automation for proactive response to new risks and vulnerabilities.
Work with internal teams to ensure operational development solutions meet business requirements.
Provide architectural guidance to software development teams to enhance resiliency efficiency performance and costeffectiveness.
Develop communicate and monitor standard processes to promote the longterm sustainability and health of operational tasks.
Collaborate with feature/serviceoriented development teams.
Implement and improve CI/CD pipelines to facilitate seamless and reliable software releases.
Participate in an oncall rotation to respond to incidents and ensure 24/7 system availability.
Qualifications:
Bachelors Degree in Computer Science Computer Engineering Software Engineering MIS or related discipline.
Prior experience as a Site Reliability Engineer (SRE) in a previous role. (Minimum 2 years experience.
Prior relevant software Development/Architecture/Engineering/DevOPS experience (Minimum 5 years experience).
Strong experience in building and supporting cloudbased solutions Azure cloud infrastructure and services knowledge preferred.
Proficiency in cloud computing concepts.
Experience with virtualization and container solutions such as Docker and Kubernetes.
Familiarity with Databricks Event Hub Redis Azure Service Bus Azure Functions and Tomcat.
Understanding of diverse infrastructure platforms and concepts.
Experience with Windows based systems and Linux administration.
Experience with configuration management and deployment automation tools (e.g. Chef Terraform Puppet Ansible Jenkins Spinnaker ArgoCD GitHub Actions).
Proficiency in one or more programming languages such as Java Python Go Perl or Ruby.
Working knowledge of database technologies (e.g. SQL Server MySQL PostgreSQL).
Experience with monitoring and logging solutions (e.g. Prometheus Grafana ELK stack AppDynamics DataDog).
Strong debugging and optimization skills with the ability to automate routine tasks.
Systematic problemsolving approach with strong communication skills and a proactive mindset.
Knowledge of the Software Development Life Cycle with experience working in QA and beta environments.
Understanding of the Agile software development methodology.
Knowledge of standard production practices including change management and incident management (ITIL).
Nice to Have:
Troubleshooting experience with diverse hosting technologies such as web server platforms Java application platforms operating systems network components virtualization technologies and database platforms.
Proficiency in Linux including experience compiling kernels tracing syscalls understanding TCP.
Knowledge of opensource software and contributions to the opensource community.
Familiarity with Rhapsody and various healthcare messaging standards such as HL7 and FHIR.
#LIhybrid
#LIAJ1