drjobs SR Site Reliability Engineer Live Site Reliability amp DevOps

SR Site Reliability Engineer Live Site Reliability amp DevOps

Employer Active

1 Vacancy
The job posting is outdated and position may be filled
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Mendoza - Argentina

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

The Role:

We are seeking an experienced Senior Site Reliability Engineer (SRE III) to lead Live Site Reliability and DevOps operations for our cloudhosted datadriven dashboard platform. This role is responsible for maintaining platform availability performance scalability and incident response while collaborating with development architecture and product teams.

As a senior technical leader youll proactively monitor production systems implement automation CI/CD pipelines and establish operational best practices to support the projects live environment. Your work ensures that our applications and services are highly available secure and reliable while supporting continuous delivery and scalable growth.

    Responsibilities:

    Live Site Reliability and Incident Management:

    Own the uptime health and reliability of production and staging environments.

    Design and implement monitoring alerting and observability for distributed systems and cloud environments.

    Lead incident response efforts perform root cause analysis and ensure postincident reviews are completed and documented.

    Maintain oncall rotation (or help establish it) for live site support.

    Automation and Infrastructure as Code (IaC):

    Design build and maintain IaC templates (Terraform CloudFormation etc. for repeatable infrastructure deployment.

    Automate operational tasks deployments and scaling strategies using tools like Ansible Chef Puppet or similar.

    Optimize cloud infrastructure for cost performance and scalability.

    Continuous Integration / Continuous Delivery (CI/CD):

    Create and maintain robust CI/CD pipelines supporting frontend backend (Laravel/PHP) and data integration deployments.

    Collaborate with developers to ensure smooth code releases and rollback procedures.

    Manage environment promotion strategies and release gating.

    Cloud and Platform Operations:

    Manage and monitor cloud environments (AWS Azure or GCP) ensuring scalability availability and security.

    Collaborate with data engineering and visualization teams to support data pipelines and dashboard integrations.

    Ensure best practices for API integrations messaging systems and thirdparty service reliability.

    Security Compliance and Risk Management:

    Implement cloud security best practices and work with the team to maintain compliance standards (SOC2 GDPR etc. if applicable).

    Conduct risk assessments and proactively mitigate vulnerabilities and performance bottlenecks.

    Collaboration and Technical Leadership:

    Partner with architecture development QA and product teams to ensure reliable maintainable and scalable system designs.

    Mentor junior team members and establish best practices for live site monitoring and incident response.

    Collaborate on backlog refinement to surface technical tasks that reduce operational risks or address tech debt.

    Documentation and Reporting:

    Maintain runbooks playbooks architectural diagrams and postmortem documentation.

    Provide regular reports on system uptime performance and reliability metrics.

    Requirements:

    7 years of experience in DevOps SRE Cloud Operations or Infrastructure Engineering.

    Proven experience in supporting live production environments incident management and root cause analysis.

    Strong experience with cloud platforms (AWS preferred Azure or GCP a plus).

    Expertise in monitoring/observability tools (Datadog Prometheus Grafana New Relic etc..

    Proficient in container orchestration (Kubernetes Docker) and microservice deployments.

    Strong experience with Terraform CloudFormation or other IaC tools.

    Deep understanding of CI/CD pipelines (GitHub Actions Jenkins GitLab CI etc..

    Solid scripting skills in Bash Python or Go.

    Knowledge of networking DNS SSL/TLS security best practices.

    Preferred Qualifications:

    Experience supporting data visualization platforms SaaS dashboards or reporting systems.

    Familiarity with PHP/Laravel React.js pipelines and Node.jsbased services.

    Experience working in Agile / Scrum development teams.

    Background in performance tuning database reliability (PostgreSQL MySQL) or message queuing systems (Kafka RabbitMQ).

    Certifications such as AWS Certified DevOps Engineer Kubernetes CKA or Google Cloud DevOps Professional.

    C NV

    Wakapi Web

    Employment Type

    Full Time

    About Company

    Report This Job
    Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.