System Reliability Engineer

Zipdev

Not Interested
Bookmark
Report This Job

profile Job Location:

Mexico City - Mexico

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Department:

Engineering

Job Summary

Description

Were looking for a passionate and experienced System Reliability Engineer to play a key role in designing implementing and maintaining our evolving cloud-native platform. Youll be instrumental in shaping our reliability practices automating operational tasks and driving continuous improvement across our systems. This is an exciting time to join us as we embark on significant refactoring efforts and continue to leverage cutting-edge technologies.

What Youll Do:

  • Design build and maintain highly available scalable and resilient systems on Google Cloud Platform (GCP).
  • Proactively monitor system health performance and capacity identifying and resolving issues before they impact users.
  • Develop and implement automation for infrastructure provisioning deployment and operational tasks (e.g. CI/CD pipelines disaster recovery).
  • Collaborate with development teams to ensure new features are designed and implemented with reliability and operational excellence in mind.
  • Manage and optimize our MongoDB Atlas instances ensuring data integrity performance and security.
  • Lead the refactoring effort of our Redis services to a more scalable and resilient Pub/Sub or Kafka-based architecture.
  • Participate in on-call rotations and incident response conducting thorough post-mortems and implementing preventative measures.
  • Contribute to the development of best practices runbooks and documentation for system operations.
  • Identify and implement opportunities for cost optimization without compromising reliability.


Requirements
  • 5 years of experience in a System Reliability Engineering DevOps or Site Reliability Engineering role.
  • Strong hands-on experience with Google Cloud Platform (GCP) services (e.g. Computer Engine Kubernetes Engine Cloud SQL Cloud Monitoring Cloud Functions Networking).
  • Proven expertise in managing and optimizing MongoDB Atlas (or other cloud-hosted) databases.
  • Solid experience with containerization technologies particularly Docker and Kubernetes.
  • Demonstrated experience with Infrastructure as Code (e.g. Terraform Cloud Deployment Manager).
  • Proficiency in scripting languages such as Python Go or Bash.
  • Familiarity with message queuing systems like Redis RabbitMQ or Kafka; direct experience with Kafka or Google Cloud Pub/Sub is a must.
  • Familiarity with Prometheus Grafana or similar monitoring and alerting tools.
  • Experience with service mesh technologies (e.g. Istio).
  • Experience with CI/CD tools and practices.
  • Strong understanding of network protocols security best practices and distributed systems.
  • Excellent problem-solving skills with a methodical approach to troubleshooting complex issues.
  • Ability to communicate effectively with both technical and non-technical stakeholders.
  • A proactive mindset with a commitment to continuous learning and improvement.


Benefits
  • Work remotely Monday - Friday 40 hours a week (no weekends)
  • Vacation: 10 business days a year
  • Holidays: 5 National Holidays a year
  • Company Holidays: 5 Company Holidays a year (Christmas Eve Christmas Day New Years Eve New Years Day Zipdev Day)
  • Parental Leave
  • Health Care Reimbursement
  • Active Lifestyle Reimbursement
  • Quarterly Home Office Reimbursement
  • Payroll Deduction Purchase Plans
  • Longevity Bonus
  • Continuous Learning Bonus
  • Access to Training and Professional Development Platforms
  • Did we mention its REMOTE!!

One of our core values at Zipdev is Be authentic. thats why we encourage you to answer the application form in your own words; we are interested in getting to know you not a digital assistant.

Wondering how our remote environment or our payment method work Weve put together some helpful answers in our FAQs at the bottom our our career site. Take a look and let us know if you have any other questions!

DescriptionWere looking for a passionate and experienced System Reliability Engineer to play a key role in designing implementing and maintaining our evolving cloud-native platform. Youll be instrumental in shaping our reliability practices automating operational tasks and driving continuous improve...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting

About Company

Company Logo

Zipdev offers the opportunity to work remotely with clients based in the United States. Zipdev recruits and hires the best Developers, Designers, QA Testers, and Project Managers in Latin America. If you have been successful working remotely, work well with remote teams and understand ... View more

View Profile View Profile