Site Reliability Engineer (SRE) 204447

Not Interested
Bookmark
Report This Job

profile Job Location:

North Las Vegas, NV - USA

profile Monthly Salary: Not Disclosed
Posted on: 8 hours ago
Vacancies: 1 Vacancy

Job Summary

Site Reliability Engineer (SRE)

RTP NC

Long Term Contract

Responsibilities:

  • Manage AWS/GCP Cloud infrastructure and Kubernetes resources; troubleshoot applications
  • in runtime environment.
  • Manage and performance tune either databases (Postgres Redis Cassandra Elasticsearch)
  • or streaming data pipelines (Kafka Knowledge of Flink /Storm /Spark /Kubeflow frameworks
  • desirable).
  • Write and maintain runbooks for knowledge driven automated processes and bots.
  • Collaborate with developers and quality engineering teams to automate the monitoring alerting
  • availability and scalability of our applications and systems.
  • Proactive monitoring diagnosis on call rotation and resolution of issues in a 24x7 of multicloud
  • environment (AWS / GCP).
  • Analyze failures provide support for software engineers to debug production issues across
  • microservices and distributed platforms.
  • Follow SRE best practices and procedures.

Technical Skills

  • Experience of maintaining production systems on AWS and/or GCP.
  • Experience in Linux and Python Shell scripting.
  • Experience of Kubernetes clusters maintenance managing and debugging containerized
  • applications (Golang Java Python).
  • Understanding of Kafka Spark Storm Cassandra ElasticSearch PostgreSQL Redis
  • (Elasticache) Zookeeper Nginx AWS S3/GCP GS.
  • Understanding of infrastructure as code software (e.g. Terraform AWS and Google Cloud
  • Deployment CloudFormation).
  • Experience in continuous integration practices & tools (Jenkins Travis CI CircleCI etc. )
  • Experience with monitoring solutions such as: CloudWatch Stackdriver Prometheus Thanos
  • Graphite Grafana ELK Alert Logic Datadog.
  • Experience with logging service solutions.
Site Reliability Engineer (SRE) RTP NC Long Term Contract Responsibilities: Manage AWS/GCP Cloud infrastructure and Kubernetes resources; troubleshoot applications in runtime environment. Manage and performance tune either databases (Postgres Redis Cassandra Elasticsearch) or streaming data pipe...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting