drjobs Tech Infra Engineer

Tech Infra Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Shanghai - China

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

As a Staff Systems Engineer in Developer Platform you will partner with leaders of multiple platform teams. You will work closely with product to define and implement simple solutions to complex orchestration problems building a highly scalable reliable and efficient platform for our customers. You will engineer and develop Kubernetes controllers operators and node-level daemons for the application runtime; drive performance tuning and scaling; and design multi-cluster control-plane capabilities that scale to millions of pods across thousands of clusters.

What You Will Do

  • Engineer and develop a unified application platform for hybrid (multi-cluster multi-region multi-cloud) application management using Kubernetes controllers and feedback-driven control systems to meet SLOs.
  • Deliver end-to-end automation for application lifecycle (deployments rollouts failovers policy enforcement) to minimize manual work for users.
  • Drive fleet-wide optimization for cost performance and latency through data-informed controls and capacity management improving $/RPS and tail latency.
  • Build resilient multi-tenant control planes and workflows that safely scale to millions of pods across thousands of clusters.
  • Ensure reliability security and governance with clear guardrails safe defaults and automated remediation.
  • Partner with product and customers to turn complex orchestration problems into simple reusable platform primitives and great developer experiences.
  • Champion observability and continuous improvement with measurable outcome-focused metrics.

Basic Qualifications

  • Bachelors degree in Computer Science Electrical Engineering Math or a closely related field (or equivalent experience)
  • 10 years in backend software development and operations
  • Recent experience designing and operating large-scale distributed systems (last 3 years) Fluency in one or more among Go C/C Python or Java
  • Proven track record of delivering mission-critical systems
  • Experience with cloud computing using AWS or Azure or GCP

Preferred Qualifications

  • Kubernetes API machinery and semantics: SSA SMP server-side dry-run watches/informers/listers rate-limited workqueues finalizers owner references leader election API Priority and Fairness
  • Controllers/operators and node daemons in Go: client-go/controller-runtime reconciliation patterns backoff and retry idempotency partitioned/sharded controllers HA and failover
  • CRDs and webhooks: versioning conversion functions/webhooks validating/mutating admission webhooks policy frameworks and best practices
  • Pod/runtime semantics: sidecars init/ephemeral containers probes (readiness/liveness/startup) lifecycle hooks termination behavior PDBs QoS classes ResourceQuota/LimitRange topology spread affinity/anti-affinity
  • Scaling systems: HPA (resource/custom/external metrics) VPA cluster autoscaler; multi-dimensional scaling health-aware/autopilot-style policies; external metrics adapters and SLO-driven scaling
  • Federated and multi-cluster: placement/propagation failover drift detection reconciliation strategies; consistent hashing and partitioning for scale
  • Distributed systems: CRDTs and eventual consistency paradigms; Raft/memberlist/gossip; deep familiarity with etcd Kafka Redis and their operational characteristics (compaction backpressure retention failover)
  • Observability and data: Prometheus (cardinality control recording rules) tracing; experience with vector databases for search and diagnostics; strong time-series forecasting (classical ML) and statistical modeling for proactive optimization
  • Languages and interfaces: Go (primary) Java/Python as needed; gRPC/protobuf; JSON/YAML/Jsonnet
  • Leadership: ability to handle multiple competing priorities in a fast-paced environment and lead the delivery of large-scale services for complex business offerings

Privacy Notice

  • Your personal information will be collected and managed by Coupang as stated in the Application Privacy Notice located below:

Employment Type

Full Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.