Staff Reliability Engineer

Coupand

Posted on : 10-08-2025

Employer Active

1 Vacancy

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Send me jobs like this

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Job Location

Bengaluru - India

Monthly Salary

Not Disclosed

Salary Not Disclosed

Vacancy

1 Vacancy

Posted on : 10-08-2025

Job Description

Company Introduction

We exist to wow our customers. We know were doing the right thing when we hear our customers say How did we ever live without Coupang Born out of an obsession to make shopping eating and living easier than ever we are collectively disrupting the multi-billion-dollar commerce industry from the ground up and establishing an unparalleled reputation for being leading and reliable force in South Korean commerce.

We are proud to have the best of both worlds a startup culture with the resources of a large global public company. This fuels us to continue our growth and launch new services at the speed we have been at since our inception. We are all entrepreneurial surrounded by opportunities to drive new initiatives and innovations. At our core we are bold and ambitious people that like to get our hands dirty and make a hands-on impact. At Coupang you will see yourself your colleagues your team and the company grow every day.

Our mission to build the future of commerce is real. We push the boundaries of whats possible to solve problems and break traditional tradeoffs. Join Coupang now to create an epic experience in this always-on high-tech and hyper-connected world.

Role Overview:

To ensure stable Coupangs IT services the IT Reliability Engineering team operates monitoring systems and processes for IT infra and applications. The team is responsible for ensuring and improving monitoring the case of an event or incident the team collaborates with the engineering team to resolve it and manage relevant metrics. To ensure the continuity of service the team regularly conducts DR tests.

Key Responsibilities:

Strategic Vision & Leadership

Define and drive the observability strategy and roadmap aligning with business and technology goals.
Establish a mature observability framework covering infrastructurenetwork applications and end-user experience.
Advocate for observability best practices across engineering operations and product teams.
Monitoring & Tool Implementation
Lead the design implementation and optimization of observability platforms (e.g. Prometheus Grafana Datadog New Relic Splunk).
Evaluate and onboard new tools and technologies to enhance visibility andtelemetry across systems.
Ensure scalable and resilient monitoring architectures are in place for hybrid and cloud-native environments.

Gap Analysis & Continuous Improvement

Conduct gap assessments in existing monitoring setups and identify areas for improvement.
Implement automated solutions to address low-hanging fruits and reduce manual overhead.
Continuously refine monitoring configurations to improve signal-to-noise ratioand reduce alert fatigue.
End-to-End Observability
Build and maintain end-to-end visibility across infrastructure network applications and user journeys.
Integrate observability tools with incident management ticketing andreporting systems.
Develop and enforce tagging strategies metrics standards and log enrichmentpractices.
Collaboration & Enablement
Partner with DevOps SRE and application teams to embed observability into CI/CD pipelines and development workflows.
Providetechnical guidance and trainingto teams on observability toolsand practices.
Support incident response and post-mortem analysis with automated diagnostics and telemetry insights.
Data-Driven Insights
Leverage observability data to generate actionable insights for performance tuning capacity planning and reliability engineering.
Create dashboards and reports that provide meaningful visibility to stakeholders at all levels.

Qualifications:

Observability & Monitoring Tools

PrometheusGrafanaZabbixSolarWinds
DatadogNew RelicDynatraceSplunk Helix
Open Telemetry(for standardized telemetry collection)

Infrastructure & Automation

TerraformAnsiblePuppetChef(IaC tools)
Scripting languages: Python Bash PowerShell
REST APIs: Experience integrating and automating observability tools via
APIs

Cloud & Container Platforms

AWSAzureGoogle Cloud Platform
KubernetesandDocker(monitoring containerized environments)
Cloud-native monitoring tools:CloudWatchAzure MonitorGCP Operations
Suite

CI/CD & DevOps Tooling

JenkinsGitLab CIGitHub Actions
Git(version control)
Integration of observability intoCI/CD pipelines

Data Analysis & Visualization

Experience withmetricslogs andtraces
Buildingdashboardsandcustom visualizations
Familiarity withSQLortime-series databases(e.g. InfluxDB TimescaleDB)

Alerting & Incident Management

Tools likePagerDutyxMatters VictorOpsServiceNowJira Helixs
Knowledge ofalert tuningevent correlation andautomated diagnostics

Architecture & Design

Understanding ofdistributed systemsmicroservices andnetwork protocols
Ability to designscalable observability architectures

Preferred Qualifications:

15 years of hands-on experience in monitoring observability and infrastructure operations.
Proven track record of designing and implementing observability platforms in complex environments.
Experience in gap analysis and optimization of monitoring setups across infrastructure network applications and end-user layers.
Strong background inDevOps orSRE.
Technical Proficiency
Deep expertise inobservability tools(Prometheus Grafana Dynatrace etc.)
Strong skills in Infrastructure as Code automation scripting and API integrations.
Familiarity withcloud-native architecturesmicroservices.
Experience integrating observability into CI/CD pipelines and incident management workflows.

Soft Skills

Strategic thinker with avision for mature observability practices.
Excellentcommunication and collaborationskills to work across teams.
Ability to mentor and guide teams on observability principles and tooling.

Type of work:

Hybrid - Coupang hybrid work model is designed to enable a culture of collaboration that acts a catalyst to enrich the experience of employees. Employees are required to work at least 3 days in the office per week with the flexibility to work from home 2 days a week depending on the role requirement. Some businesses may require more time in office due to nature of work.

Details to consider

Those eligible for employment protection (recipients of veterans benefits the disabled etc.) may receive preferential treatment for employment in accordance with applicable laws.

Privacy Notice

Your personal information will be collected and managed by Coupang as stated in the Application Privacy Notice located below: Experience:

Staff IC

Employment Type

Full-Time

Company Industry

Key Skills

Apply Now

About Company

Coupand

Report This Job

Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.

Start Now

Dr.Job AutoApply

3X your job search with AutoApply's AI for faster dream job results.

Staff Reliability Engineer

Coupand

Job Description

Employment Type

Company Industry

Key Skills

About Company

Similar Jobs

Staff Software Reliability Engineer

Senior Software Engineer Offline Team (Open to Remote across ANZ)

Systems Engineer

System Engineer

Engineer Systems Performance

Senior Analytics Engineer

Senior Hardware Engineer

EOI Software Engineer