| Detailed JD (Roles and Responsibilities) | The Site Reliability Engineer II is responsible for providing continuous feedback of site health reliability availability and user experience to both engineering and product owners. Real-time measurements for production environments will be collected aggregated analyzed using both infrastructure and APM tools including but not limited to SolarWinds Dynatrace and log addition to monitoring and insight a heavy focus will be placed on automation opportunities and automating operational processes to maintain 99.9% availability of AvidXchange core products. Performs Production SaaS operational and administration duties to maintain the health and reliability of SaaS production systems Performs Production SaaS support incident management problem management and service restoration as needed to quickly respond to and resolve production issues Implements and trains team members on tools for measuring core product health in production (with opportunities to extend those capabilities all the way back through the entire DevOps pipeline) Implements and trains team members for calculating system availability SLAs across AvidXchange products
JOB OVERVIEW The Site Reliability Engineer is responsible for providing continuous feedback of site health reliability availability and user experience to both engineering and product owners. Real-time measurements for production environments will be collected aggregated analyzed using both infrastructure and APM tools including but not limited to SolarWinds Dynatrace and log addition to monitoring and insight a heavy focus will be placed on automation opportunities and automating operational processes to maintain 99.9% availability of AvidXchange core products. Implements and executes the tool consolidation strategy to optimize spend versus value for our end to end monitoring platform Implements rapid and continuous development and application of automated solutions to address reliability issues and automate manual tasks Works with the Software DevOps team to implement DevOps CICD continuous performance testing monitoring and reliability strategy using Visual Studio Team Services and other cloud-based tools Implements the measurement capability of core product availability across Azure and AvidXchange Cloud using HTTP endpoint testing and synthetic user testing Maintain automated site availability reporting and data platform Gathers data for usability reliability incident and user experience of AvidXchange products for consumption by executive leadership on a weekly basis Influences product delivery teams to implement usability and reliability enhancements leading to improved user experience index scores and improved availability Provides detailed analysis and troubleshooting for systems outages providing feedback to product / software engineering
| ||
| Total Experience | 5 total experience | ||
| Relevant Experience | 3 years Relevant experience | ||
| Mandatory skills | Site Reliability Engineer
APM tools including but not limited to SolarWinds Dynatrace and log analytics. |