Be Klaviyos senior IC for scale you will report into a VP of Engineering and lead performance reliability multiregion and largetenant readiness. Youll drive platform-wide architectural change hunt bottlenecks and optimize systems and partner across teams to productionize improvements. Given that this is an IC role with no direct reports; you will lead via technical depth handson impact and crisp crossorg alignment.
What Youll Do
- Define enterprise scalability fitness functions (latency/throughput/error rates) and a scorecard; align teams to SLOs and budgets.
- Design/implement sharding and partitioning strategies caching/backpressure multiregion readiness and highvolume migration paths.
- Build lightweight enablement: benchmarks profiling harnesses reproducible testbeds; pair with teams to land fixes.
- Lead scalability reviews and readiness gates that acceleratenot blockdelivery; drive incident deep dives tied to systemic fixes.
- Communicate clearly to execs and engineers tying technical work to business impact and customer outcomes.
- Integrate AI into scale and resiliency workfrom proactive anomaly detection to synthetic load and guided runbooksso performance improvements stick and incidents dont repeat.
Who You Are
- Experience: 12 years scaling multitenant SaaS with a reputation for removing major bottlenecks and proving impact with data.
- Technical expertise: Performance engineering capacity planning sharding/partitioning caching/backpressure multiregion readiness and highvolume migrations; you turn hotspots into robust patterns.
- AI tools & automation: You apply AI to scale workprofiling assistance workload modeling synthetic traffic generation anomaly detection and runbook copilotsalways with explicit guardrails and observability.
- Crossorg influence: You align teams through fitness functions scorecards and readiness gates that acceleratenot blockdelivery; you communicate tradeoffs crisply to execs and engineers.
- AI fluency: Curious adaptable and proactive in exploring AI that responsibly improves scale outcomes.
Nice to Haves
- Scale scorecard: Companywide fitness functions (latency/throughput/error rates) are adopted and reviewed regularly.
- Highimpact wins: 23 bottlenecks removed with documented reproducible testbeds; pXX latencies and error rates improve on top enterprise workloads; repeat P0s trend down.
- AIassisted scale engineering: AIdriven anomaly detection reduces alert noise while improving signal; generative load testing and copilot runbooks are used in release/readiness checks for the top critical services; timetoisolate regressions drops 2030%.
Success in 612 Months
- Companywide scale scorecard in place; 23 highimpact bottlenecks removed; top enterprise workloads show improved pXX latencies and error rates; fewer repeat P0s.
We use Covey as part of our hiring and / or promotional process. For jobs or candidates in NYC certain features may qualify it as an AEDT. As part of the evaluation process we provide Covey with job requirements and candidate submitted applications. We began using Covey Scout for Inbound on April 3 2025.
Please see the independent bias audit report covering our use of Covey here
Required Experience:
Staff IC
Be Klaviyos senior IC for scale you will report into a VP of Engineering and lead performance reliability multiregion and largetenant readiness. Youll drive platform-wide architectural change hunt bottlenecks and optimize systems and partner across teams to productionize improvements. Given that thi...
Be Klaviyos senior IC for scale you will report into a VP of Engineering and lead performance reliability multiregion and largetenant readiness. Youll drive platform-wide architectural change hunt bottlenecks and optimize systems and partner across teams to productionize improvements. Given that this is an IC role with no direct reports; you will lead via technical depth handson impact and crisp crossorg alignment.
What Youll Do
- Define enterprise scalability fitness functions (latency/throughput/error rates) and a scorecard; align teams to SLOs and budgets.
- Design/implement sharding and partitioning strategies caching/backpressure multiregion readiness and highvolume migration paths.
- Build lightweight enablement: benchmarks profiling harnesses reproducible testbeds; pair with teams to land fixes.
- Lead scalability reviews and readiness gates that acceleratenot blockdelivery; drive incident deep dives tied to systemic fixes.
- Communicate clearly to execs and engineers tying technical work to business impact and customer outcomes.
- Integrate AI into scale and resiliency workfrom proactive anomaly detection to synthetic load and guided runbooksso performance improvements stick and incidents dont repeat.
Who You Are
- Experience: 12 years scaling multitenant SaaS with a reputation for removing major bottlenecks and proving impact with data.
- Technical expertise: Performance engineering capacity planning sharding/partitioning caching/backpressure multiregion readiness and highvolume migrations; you turn hotspots into robust patterns.
- AI tools & automation: You apply AI to scale workprofiling assistance workload modeling synthetic traffic generation anomaly detection and runbook copilotsalways with explicit guardrails and observability.
- Crossorg influence: You align teams through fitness functions scorecards and readiness gates that acceleratenot blockdelivery; you communicate tradeoffs crisply to execs and engineers.
- AI fluency: Curious adaptable and proactive in exploring AI that responsibly improves scale outcomes.
Nice to Haves
- Scale scorecard: Companywide fitness functions (latency/throughput/error rates) are adopted and reviewed regularly.
- Highimpact wins: 23 bottlenecks removed with documented reproducible testbeds; pXX latencies and error rates improve on top enterprise workloads; repeat P0s trend down.
- AIassisted scale engineering: AIdriven anomaly detection reduces alert noise while improving signal; generative load testing and copilot runbooks are used in release/readiness checks for the top critical services; timetoisolate regressions drops 2030%.
Success in 612 Months
- Companywide scale scorecard in place; 23 highimpact bottlenecks removed; top enterprise workloads show improved pXX latencies and error rates; fewer repeat P0s.
We use Covey as part of our hiring and / or promotional process. For jobs or candidates in NYC certain features may qualify it as an AEDT. As part of the evaluation process we provide Covey with job requirements and candidate submitted applications. We began using Covey Scout for Inbound on April 3 2025.
Please see the independent bias audit report covering our use of Covey here
Required Experience:
Staff IC
View more
View less