Job Summary
As a Cloud Infrastructure / Site Reliability Engineer you will operate at the intersection of development and operations. You will engage and enhance all aspects of the cloud services lifecycle from design through deployment operation and refinement. You will be responsible for maintaining these services by measuring and monitoring their availability latency and overall system health and building automation for efficient cloud operations management.
You will play a crucial role in sustainably scaling systems through automation and driving changes that improve reliability and velocity. As part of your responsibilities you will administer cloud-based environments that support our SaaS/IaaS offerings implemented on a microservices container-based architecture (Kubernetes). In addition you will oversee a portfolio of customer-centric cloud services (SaaS/IaaS) ensuring their overall availability performance and security. You will work closely with NetApp and cloud service provider teams (to include Azure) from NetApp sites in Research Triangle Park (RTP)NC;Vienna VA; Waltham MA; orPittsburgh PA
Due to the critical nature of the services we support this position involves participation in a rotation-based on-call schedule as part of our global team. This role offers the opportunity to work in a dynamic global environment ensuring the smooth operation of vital cloud services. To be successful in this role you should be a motivated self-starter and self-learner possess strong problem-solving skills and be someone who embraces challenges.
Key Responsibilities
- Automation and Efficiency: Identify tasks and areas where automation can be applied to achieve time efficiencies and risk reduction. Develop software for deployment automation packaging and monitoring visibility.
- Team Collaboration and Influence: Work in tandem with other Cloud Infrastructure Engineers and developers to ensure maximum performance reliability and automation of our deployments and infrastructure. Consult and influence developers on new feature development and software architecture to ensure scalability.
- Debugging Troubleshooting and Advanced Support: Undertake debugging and troubleshooting of service bottlenecks throughout the entire software stack. Additionally provide advanced tier 2 and 3 support for NetApps Cloud Data Service solutions.
- Analysis and Infrastructure Maintenance: Continuously monitor analyze and measure system health availability and latency using tools like Prometheus Stackdriver ElasticSearch Grafana and SolarWinds. Develop strategies to enhance system and application performance availability and addition maintain and monitor the deployment and orchestration of servers docker containers databases and general backend infrastructure.
- Incident Response and Troubleshooting: Address and perform Root Cause Analysis (RCA) of complex live production incidents and cross-platform issues involving OS Networking and Database in cloud-based SaaS/IaaS environments. Implement SRE best practices for effective resolution.
- Document system knowledge as you acquire it create runbooks and ensure critical system information is readily accessible.
- Security Management: Stay updated with security protocols and proactively identify diagnose and resolve complex security issues.
- Issue Tracking and Resolution: Use Atlassians tool chain along with first party cloud service management tools to track and resolve issues based on their priority.
- Directly influence the decisions and outcomes related to solution implementation: measure and monitor availability latency and overall system health.
Job Requirements
- 8 years experience in scripting and infrastructure automation using tools such as PowerShell Python orGo
- Deep working knowledge of Containers Kubernetes Serverless computing implementation and distributed systems design patterns.
- Knowledge of DevOps/SRE development methodologies.
- Proficiency in Linux/Unix and CoreOS.
- Experience with cloud platforms such as AWS Azure or Google Cloud.
- Ability to lead a scrum team influence stakeholders to effectively maintain a product backlog manage sprints.
- Must be a US Citizen or Green Card holder.
- This position will have ON-CALL rotations as well as an ask to work odd hours.
- Preference if you possess either an interim Secret clearance (or above) or have recently undergone a Criminal Justice Information Services (CJIS) background check to verify criminal history employment history and financial/credit history.
Education
- A Bachelor of Science Degree in Computer Science a masters degree; or equivalent experience is required
All internal movements within the Product Group via requisition will be lateral offering valuable growth opportunities to extend your skills in a new area. Opportunities for a promotion will be reviewed in the normal course of business aligned with our promotion process.
Compensation:
The target salary range for this position is 159800 - 237600 USD. The salary offered will be determined by the candidates location qualifications experience and education and may be outside of this range. Final compensation packages are competitive and in line with industry standards reflecting a variety of factors and include a comprehensive benefits package. This may cover Health Insurance Life Insurance Retirement or Pension Plans Paid Time Off (PTO) various Leave options Performance-Based Incentives employee stock purchase plan and/or restricted stocks (RSUs) with all offerings subject to regional variations and governed by local laws regulations and company policies. Benefits may vary by country and region and further details will be provided as part of the recruitment process.
Required Experience:
Senior IC
Job Summary As a Cloud Infrastructure / Site Reliability Engineer you will operate at the intersection of development and operations. You will engage and enhance all aspects of the cloud services lifecycle from design through deployment operation and refinement. You will be responsible for maintaini...
Job Summary
As a Cloud Infrastructure / Site Reliability Engineer you will operate at the intersection of development and operations. You will engage and enhance all aspects of the cloud services lifecycle from design through deployment operation and refinement. You will be responsible for maintaining these services by measuring and monitoring their availability latency and overall system health and building automation for efficient cloud operations management.
You will play a crucial role in sustainably scaling systems through automation and driving changes that improve reliability and velocity. As part of your responsibilities you will administer cloud-based environments that support our SaaS/IaaS offerings implemented on a microservices container-based architecture (Kubernetes). In addition you will oversee a portfolio of customer-centric cloud services (SaaS/IaaS) ensuring their overall availability performance and security. You will work closely with NetApp and cloud service provider teams (to include Azure) from NetApp sites in Research Triangle Park (RTP)NC;Vienna VA; Waltham MA; orPittsburgh PA
Due to the critical nature of the services we support this position involves participation in a rotation-based on-call schedule as part of our global team. This role offers the opportunity to work in a dynamic global environment ensuring the smooth operation of vital cloud services. To be successful in this role you should be a motivated self-starter and self-learner possess strong problem-solving skills and be someone who embraces challenges.
Key Responsibilities
- Automation and Efficiency: Identify tasks and areas where automation can be applied to achieve time efficiencies and risk reduction. Develop software for deployment automation packaging and monitoring visibility.
- Team Collaboration and Influence: Work in tandem with other Cloud Infrastructure Engineers and developers to ensure maximum performance reliability and automation of our deployments and infrastructure. Consult and influence developers on new feature development and software architecture to ensure scalability.
- Debugging Troubleshooting and Advanced Support: Undertake debugging and troubleshooting of service bottlenecks throughout the entire software stack. Additionally provide advanced tier 2 and 3 support for NetApps Cloud Data Service solutions.
- Analysis and Infrastructure Maintenance: Continuously monitor analyze and measure system health availability and latency using tools like Prometheus Stackdriver ElasticSearch Grafana and SolarWinds. Develop strategies to enhance system and application performance availability and addition maintain and monitor the deployment and orchestration of servers docker containers databases and general backend infrastructure.
- Incident Response and Troubleshooting: Address and perform Root Cause Analysis (RCA) of complex live production incidents and cross-platform issues involving OS Networking and Database in cloud-based SaaS/IaaS environments. Implement SRE best practices for effective resolution.
- Document system knowledge as you acquire it create runbooks and ensure critical system information is readily accessible.
- Security Management: Stay updated with security protocols and proactively identify diagnose and resolve complex security issues.
- Issue Tracking and Resolution: Use Atlassians tool chain along with first party cloud service management tools to track and resolve issues based on their priority.
- Directly influence the decisions and outcomes related to solution implementation: measure and monitor availability latency and overall system health.
Job Requirements
- 8 years experience in scripting and infrastructure automation using tools such as PowerShell Python orGo
- Deep working knowledge of Containers Kubernetes Serverless computing implementation and distributed systems design patterns.
- Knowledge of DevOps/SRE development methodologies.
- Proficiency in Linux/Unix and CoreOS.
- Experience with cloud platforms such as AWS Azure or Google Cloud.
- Ability to lead a scrum team influence stakeholders to effectively maintain a product backlog manage sprints.
- Must be a US Citizen or Green Card holder.
- This position will have ON-CALL rotations as well as an ask to work odd hours.
- Preference if you possess either an interim Secret clearance (or above) or have recently undergone a Criminal Justice Information Services (CJIS) background check to verify criminal history employment history and financial/credit history.
Education
- A Bachelor of Science Degree in Computer Science a masters degree; or equivalent experience is required
All internal movements within the Product Group via requisition will be lateral offering valuable growth opportunities to extend your skills in a new area. Opportunities for a promotion will be reviewed in the normal course of business aligned with our promotion process.
Compensation:
The target salary range for this position is 159800 - 237600 USD. The salary offered will be determined by the candidates location qualifications experience and education and may be outside of this range. Final compensation packages are competitive and in line with industry standards reflecting a variety of factors and include a comprehensive benefits package. This may cover Health Insurance Life Insurance Retirement or Pension Plans Paid Time Off (PTO) various Leave options Performance-Based Incentives employee stock purchase plan and/or restricted stocks (RSUs) with all offerings subject to regional variations and governed by local laws regulations and company policies. Benefits may vary by country and region and further details will be provided as part of the recruitment process.
Required Experience:
Senior IC
View more
View less