Job Title: Infra SRE consultant/ Engineer
Location: Santa Clara CA (Onsite)
Job Type: Contract
Responsibilities:
- Manage Clients on-prem infrastructure. Maintain uptime reliability and readiness of on-prem engineering cloud spread across multiple data centers.
- Guard service level agreements (SLAs) for critical engineering services. Implement monitoring alerting and incident response procedures to ensure adherence to defined performance targets. Perform root cause analysis and post-mortems of incidents for any threshold breaches
- Set up and manage monitoring and logging tools such as Prometheus Grafana or the ELK Stack to oversee system health and performance. Maintain KPI pipelines using Jenkins Python and ELK.
- Improve monitoring systems by adding custom alerts based on business needs.
- Help in capacity planning optimization and better utilization efforts.
- Create and maintain documentation for operational procedures configurations and troubleshooting guides.
Skills:
- Hands-on on-prem SRE and infrastructure operations
- Strong in monitoring & observability using Prometheus Grafana ELK with KPI pipeline integration via Jenkins/Python
- Proficient in automation and scripting using Jenkins Python Go Bash
Job Title: Infra SRE consultant/ Engineer Location: Santa Clara CA (Onsite) Job Type: Contract Responsibilities: Manage Clients on-prem infrastructure. Maintain uptime reliability and readiness of on-prem engineering cloud spread across multiple data centers. Guard service level agreements (SLAs...
Job Title: Infra SRE consultant/ Engineer
Location: Santa Clara CA (Onsite)
Job Type: Contract
Responsibilities:
- Manage Clients on-prem infrastructure. Maintain uptime reliability and readiness of on-prem engineering cloud spread across multiple data centers.
- Guard service level agreements (SLAs) for critical engineering services. Implement monitoring alerting and incident response procedures to ensure adherence to defined performance targets. Perform root cause analysis and post-mortems of incidents for any threshold breaches
- Set up and manage monitoring and logging tools such as Prometheus Grafana or the ELK Stack to oversee system health and performance. Maintain KPI pipelines using Jenkins Python and ELK.
- Improve monitoring systems by adding custom alerts based on business needs.
- Help in capacity planning optimization and better utilization efforts.
- Create and maintain documentation for operational procedures configurations and troubleshooting guides.
Skills:
- Hands-on on-prem SRE and infrastructure operations
- Strong in monitoring & observability using Prometheus Grafana ELK with KPI pipeline integration via Jenkins/Python
- Proficient in automation and scripting using Jenkins Python Go Bash
View more
View less