Site Reliability Engineer (SRE)
RTP NC
Long Term Contract
Responsibilities:
- Manage AWS/GCP Cloud infrastructure and Kubernetes resources; troubleshoot applications
- in runtime environment.
- Manage and performance tune either databases (Postgres Redis Cassandra Elasticsearch)
- or streaming data pipelines (Kafka Knowledge of Flink /Storm /Spark /Kubeflow frameworks
- desirable).
- Write and maintain runbooks for knowledge driven automated processes and bots.
- Collaborate with developers and quality engineering teams to automate the monitoring alerting
- availability and scalability of our applications and systems.
- Proactive monitoring diagnosis on call rotation and resolution of issues in a 24x7 of multicloud
- environment (AWS / GCP).
- Analyze failures provide support for software engineers to debug production issues across
- microservices and distributed platforms.
- Follow SRE best practices and procedures.
Technical Skills
- Experience of maintaining production systems on AWS and/or GCP.
- Experience in Linux and Python Shell scripting.
- Experience of Kubernetes clusters maintenance managing and debugging containerized
- applications (Golang Java Python).
- Understanding of Kafka Spark Storm Cassandra ElasticSearch PostgreSQL Redis
- (Elasticache) Zookeeper Nginx AWS S3/GCP GS.
- Understanding of infrastructure as code software (e.g. Terraform AWS and Google Cloud
- Deployment CloudFormation).
- Experience in continuous integration practices & tools (Jenkins Travis CI CircleCI etc. )
- Experience with monitoring solutions such as: CloudWatch Stackdriver Prometheus Thanos
- Graphite Grafana ELK Alert Logic Datadog.
- Experience with logging service solutions.
Site Reliability Engineer (SRE) RTP NC Long Term Contract Responsibilities: Manage AWS/GCP Cloud infrastructure and Kubernetes resources; troubleshoot applications in runtime environment. Manage and performance tune either databases (Postgres Redis Cassandra Elasticsearch) or streaming data pipe...
Site Reliability Engineer (SRE)
RTP NC
Long Term Contract
Responsibilities:
- Manage AWS/GCP Cloud infrastructure and Kubernetes resources; troubleshoot applications
- in runtime environment.
- Manage and performance tune either databases (Postgres Redis Cassandra Elasticsearch)
- or streaming data pipelines (Kafka Knowledge of Flink /Storm /Spark /Kubeflow frameworks
- desirable).
- Write and maintain runbooks for knowledge driven automated processes and bots.
- Collaborate with developers and quality engineering teams to automate the monitoring alerting
- availability and scalability of our applications and systems.
- Proactive monitoring diagnosis on call rotation and resolution of issues in a 24x7 of multicloud
- environment (AWS / GCP).
- Analyze failures provide support for software engineers to debug production issues across
- microservices and distributed platforms.
- Follow SRE best practices and procedures.
Technical Skills
- Experience of maintaining production systems on AWS and/or GCP.
- Experience in Linux and Python Shell scripting.
- Experience of Kubernetes clusters maintenance managing and debugging containerized
- applications (Golang Java Python).
- Understanding of Kafka Spark Storm Cassandra ElasticSearch PostgreSQL Redis
- (Elasticache) Zookeeper Nginx AWS S3/GCP GS.
- Understanding of infrastructure as code software (e.g. Terraform AWS and Google Cloud
- Deployment CloudFormation).
- Experience in continuous integration practices & tools (Jenkins Travis CI CircleCI etc. )
- Experience with monitoring solutions such as: CloudWatch Stackdriver Prometheus Thanos
- Graphite Grafana ELK Alert Logic Datadog.
- Experience with logging service solutions.
View more
View less