Engineering Manager Batch Compute Infrastructure
Job Summary
Who We Are
About Stripe
Stripe is a financial infrastructure platform for businesses. Millions of companies - from the worlds largest enterprises to the most ambitious startups - use Stripe to accept payments grow their revenue and accelerate new business opportunities. Our mission is to increase the GDP of the internet and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyones reach while doing the most important work of your career.
About the Team
The Batch Compute Infrastructure team at Stripe manages the foundational infrastructure tooling and distributed systems behind Stripes massive-scale batch processing environments currently encompassing over 5000 computational nodes. Powered primarily by Hadoop Spark and Celeborn these systems are the backbone for several core asynchronous financial analytical and regulatory workflows at Stripe operating at petabyte scale.
What youll do -
You will support a team of engineers focused on building the tooling infra and systems for operating Spark Hadoop and addition to helping define the roadmap for these systems you will be interacting with many other managers and their teams at Stripe who rely on the Data processing Infra team to deliver efficient and scalable services to our customers. You will work with both the finance and engineering organization (infrastructure & product) to define measure and monitor the cost efficiency of these systems.
Responsibilities
- Drive Strategic Vision: Define the multi-year roadmap for Stripes Batch Compute Infrastructure leading complex architectural shifts and modernization.
- Lead and Scale: Build mentor and aggressively scale a high-performing team of engineers proactively investing in their career development and fostering a culture of operational excellence.
- Ensure Operational Rigor: Maintain unwavering reliability for a Tier-0 infrastructure processing tens of thousands of daily workloads proactively mitigating risks and managing complex on-call telemetry.
- Cross-Functional Orchestration: Collaborate deeply with data platform teams finance and user groups to define compute efficiency metrics execute massive-scale cost optimization strategies and guarantee compliance with global financial regulations.
- Technical Stewardship: Provide technical guidance in architecture reviews evaluating critical cost performance and reliability trade-offs in distributed systems design involving Hadoop Spark AWS cloud primitives and modern metastores.
Who You Are
Minimum requirements
- 10 years of professional software development and engineering experience.
- 3 years of direct engineering management experience successfully building and operating high-velocity technical teams.
- Deep technical background in building scaling and maintaining large-scale distributed data systems or Tier-0 infrastructure using open-source tools (e.g. Hadoop Spark Celeborn Airflow Kafka).
- Proven track record of driving significant infrastructure efficiency managing capacity planning and making data-driven cost-performance trade-offs.
- Experience working effectively in highly cross-functional global organizations.
Preferred requirements
- Experience managing remote or geographically distributed engineering teams.
- Familiarity with managing a massive fleet of Linux servers on-premise Hadoop clusters and modern cloud data architectures (e.g. AWS S3 Graviton).
- Demonstrated ability to navigate strategic ambiguity and deliver complex multi-quarter infrastructural projects from inception to completion.
- Deep empathy for internal data users with a passion for building robust developer tooling and abstractions.
Required Experience:
Manager
About Company
Stripe is a suite of APIs powering online payment processing and commerce solutions for internet businesses of all sizes. Accept payments and scale faster with AI.