Principal Architect, Memory-Centric Computing AI Infrastructure

Samsung Semiconductor

Job Location:

San Jose, CA - USA

Monthly Salary: $ 219000 - 351000

Posted on: Yesterday

Vacancies: 1 Vacancy

Job Summary

Please Note:

To provide the best candidate experience amidst our high application volumes each candidate is limited to 10 applications across all open jobs within a 6-month period.

Advancing the Worlds Technology Together

Our technology solutions power the tools you use every day--including smartphones electric vehicles hyperscale data centers IoT devices and so much more. Here youll have an opportunity to be part of a global leader whose innovative designs are pushing the boundaries of whats possible and powering the future.

We believe innovation and growth are driven by an inclusive culture and a diverse workforce. Were dedicated to empowering people to be their true selves. Together were building a better tomorrow for our employees customers partners and communities.

The AGI (Artificial General Intelligence) Computing Lab is dedicated to solving the complex system-level challenges posed by the growing demands of future AI/ML workloads. Our team is committed to designing and developing scalable platforms that can effectively handle the computational and memory requirements of these workloads while minimizing energy consumption and maximizing performance. To achieve this goal we collaborate closely with both hardware and software engineers to identify and address the unique challenges posed by AI/ML workloads and to explore new computing abstractions that can provide a better balance between the hardware and software components of our systems. Additionally we continuously conduct research and development in emerging technologies and trends across memory computing interconnect and AI/ML ensuring that our platforms are always equipped to handle the most demanding workloads of the future. By working together as a dedicated and passionate team we aim to revolutionize the way AI/ML applications are deployed and executed ultimately contributing to the advancement of AGI in an affordable and sustainable manner. Join us in our passion to shape the future of computing!

As AI models scale memory its capacity bandwidth cost and placement has become the central architectural constraint. The question is no longer whether to rethink memory system design but how. A broad solution space exists: GPU-side shared memory architectures DRAM and Flash as capacity tiers fabric-attached pooling and disaggregation and new interconnect approaches all represent credible paths. Each carries different tradeoff profiles across workloads deployment contexts and cost structures.

This role exists to bring rigor to that question. You will build workload-grounded models that evaluate the full solution space quantify where each approach wins and why and translate those findings into architecture decisions that directly shape product strategy and investment. You will work closely with architects across compute networking storage and software and present directly to senior technical leadership. This is a principal individual contributor role: you personally build the models own the conclusions and drive the decisions.

Location: Daily onsite presence at our San Jose CA office / U.S. headquarters in alignment with our Flexible Work policy.

What Youll Do

Architecture Strategy & Trade Studies

Define and evaluate the memory solution space GPU-side shared memory DRAM and Flash capacity tiers pooled/disaggregated memory and fabric-attached approaches with quantified value propositions across performance power cost/TCO density and operability
Identify break-even conditions and decision criteria across solution approaches; produce architecture briefs and sensitivity analyses ready for executive audiences

Workload-Driven Analysis

Ground every architectural comparison in real AI behavior: large model training/inference (including long-context and KV-cache dynamics) MoE and sparse workloads multi-step agentic pipelines and recommendation/embedding workloads
Build and maintain a workload methodology microbenchmarks proxy models traces tied to throughput latency tail latency utilization and SLA impact

Memory Hierarchy & Tiered Design

Architect and compare memory hierarchies spanning local high-bandwidth memory DRAM capacity tiers Flash (NVMe/NVMe-oF) pooled/remote memory and storage-class approaches; evaluate placement caching prefetching eviction QoS and contention policies across tiers
Define the software exposure and operational model runtime OS and library expectations with deployability and observability as first-class requirements

Connectivity & Pooling Approaches

Evaluate the connectivity and pooling solution space as complementary or competing answers to the memory capacity and bandwidth problem including GPU-side shared memory (e.g. NVLink-class Vera Rubin-style) fabric-attached pooling (e.g. CXL-class) and emerging interconnect directions (UALink/UEth-class)
Quantify how latency bandwidth congestion topology and coherency assumptions affect end-to-end AI performance across approaches; drive cross-domain alignment on connectivity trade decisions

Hands-On Modeling & Validation

Build and extend system simulators and trace-driven models spanning compute memory Flash/storage and IO; write analysis code (Python C/C) to automate experiments and process results
Profile and instrument GPU/CPU/system stacks to validate model assumptions; run disciplined studies with baselines parameter sweeps and reproducible documentation

What You Bring

Cross-domain reasoning. The core requirement. You connect AI workload behavior memory hierarchy (including DRAM and Flash tiers) connectivity/fabric and storage/IO into coherent quantified arguments evaluating a broad solution space rather than advocating for any single technology.
Proven impact. 12 years in system architecture performance engineering or infrastructure modeling with a track record of studies that influenced product direction investment decisions or platform strategy.
AI infrastructure fluency. Working knowledge of training and inference bottlenecks data movement patterns and memory pressure across transformers MoE and recommendation workloads. Engineering literacy required; researcher depth is not.
Memory and storage grounding. Solid understanding of memory hierarchy and tiering principles across DRAM and Flash; storage/IO fundamentals including tail latency QoS and NVMe/NVMe-oF behavior; and connectivity/fabric options for shared pooled and disaggregated memory.
Analytical rigor. Credible quantitative modeling clean experimental methodology and the ability to defend assumptions under scrutiny from both hardware and software engineers.
Communication that moves decisions. Converts complex multi-domain analysis into clear recommendations for engineering and executive audiences written and verbal.
Hands-on experience with GPU-side shared memory architectures DRAM/Flash tiering for AI workloads or fabric-attached memory pooling/disaggregation.
Familiarity with NVLink-class fabrics CXL-class pooling or emerging interconnect standards (UALink/UEth-class).
Prior ownership of benchmarking strategy for memory-intensive or storage-tiered AI workloads.
Familiarity with inference caching KV-cache management or Flash-backed serving at scale.
Experience with discrete-event or trace-driven system simulation.
Youre inclusive adapting your style to the situation and diverse global norms of our people.
An avid learner you approach challenges with curiosity and resilience seeking data to help build understanding.
Youre collaborative building relationships humbly offering support and openly welcoming approaches.
Innovative and creative you proactively explore new ideas and adapt quickly to change.

#LI-VL1

What We Offer
The pay range below is for all roles at this level across all US locations and functions. Paywithin this range varies by work locationand may also depend on job-related knowledge skillsand experience. We also offer incentive opportunities that reward employees based on individual and company performance.

This is in addition to our diverse package of benefits centered around the wellbeing of our employees and their loved addition to the usual Medical/Dental/Vision/401k our inclusive rewards plan empowers our people to care for their whole selves. An investment in your future is an investment in ours.

Give Back With a charitable giving match and frequent opportunities to get involved we take an active role in supporting the community.
Enjoy Time Away Youll start with 4 weeks of paid time off a year plus holidays and sick leave to rest and recharge.
Care for Family Whatever family means to you we want to support you along the wayincluding a stipend for fertility care or adoption medical travel support and virtual vet care for your fur babies.
Prioritize Emotional Wellness With on-demand apps and free confidential therapy sessions youll have support no matter where you are.
Stay Fit Eating well and being active are important parts of a healthy life. Our onsite Café and gym plus virtual classes make it easier.
Embrace Flexibility Benefits are best when you have the space to use them. Thats why we facilitate a flexible environment so you can find the right balance for you.

Base Pay Range

$219000 - $351000 USD

Equal Opportunity Employment Policy

Samsung Semiconductor takes pride in being an equal opportunity workplace dedicated to fostering an environment where all individuals feel valued and empowered to excel regardless of race religion color age disability sex gender identity sexual orientation ancestry genetic information marital status national origin political affiliation or veteran status.

When selecting team members we prioritize talent and qualities such as humility kindness and dedication. We extend comprehensive accommodations throughout our recruiting processes for candidates with disabilities long-term conditions neurodivergent individuals or those requiring pregnancy-related support. All candidates scheduled for an interview will receive guidance on requesting accommodations.

Recruiting Agency Policy

We do not accept unsolicited resumes. Only authorized recruitment agencies that have a current and valid agreement with Samsung Semiconductor Inc. are permitted to submit resumes for any job openings.

Applicant AI Use Policy

At Samsung Semiconductor we support innovation and technology. However to ensure a fair and authentic assessment we prohibit the use of generative AI tools to misrepresent a candidates true skills and qualifications. Permitted uses are limited to basic preparation grammar and research but all submitted content and interview responses must reflect the candidates genuine abilities and experience. Violation of this policy may result in immediate disqualification from the hiring process.

Applicant Privacy Policy
Experience:

Staff IC

Please Note:To provide the best candidate experience amidst our high application volumes each candidate is limited to 10 applications across all open jobs within a 6-month period.Advancing the Worlds Technology TogetherOur technology solutions power the tools you use every day--including smartphones...

Please Note:

To provide the best candidate experience amidst our high application volumes each candidate is limited to 10 applications across all open jobs within a 6-month period.

Advancing the Worlds Technology Together

Location: Daily onsite presence at our San Jose CA office / U.S. headquarters in alignment with our Flexible Work policy.

What Youll Do

Architecture Strategy & Trade Studies

Define and evaluate the memory solution space GPU-side shared memory DRAM and Flash capacity tiers pooled/disaggregated memory and fabric-attached approaches with quantified value propositions across performance power cost/TCO density and operability
Identify break-even conditions and decision criteria across solution approaches; produce architecture briefs and sensitivity analyses ready for executive audiences

Workload-Driven Analysis

Ground every architectural comparison in real AI behavior: large model training/inference (including long-context and KV-cache dynamics) MoE and sparse workloads multi-step agentic pipelines and recommendation/embedding workloads
Build and maintain a workload methodology microbenchmarks proxy models traces tied to throughput latency tail latency utilization and SLA impact

Memory Hierarchy & Tiered Design

Architect and compare memory hierarchies spanning local high-bandwidth memory DRAM capacity tiers Flash (NVMe/NVMe-oF) pooled/remote memory and storage-class approaches; evaluate placement caching prefetching eviction QoS and contention policies across tiers
Define the software exposure and operational model runtime OS and library expectations with deployability and observability as first-class requirements

Connectivity & Pooling Approaches

Evaluate the connectivity and pooling solution space as complementary or competing answers to the memory capacity and bandwidth problem including GPU-side shared memory (e.g. NVLink-class Vera Rubin-style) fabric-attached pooling (e.g. CXL-class) and emerging interconnect directions (UALink/UEth-class)
Quantify how latency bandwidth congestion topology and coherency assumptions affect end-to-end AI performance across approaches; drive cross-domain alignment on connectivity trade decisions

Hands-On Modeling & Validation

Build and extend system simulators and trace-driven models spanning compute memory Flash/storage and IO; write analysis code (Python C/C) to automate experiments and process results
Profile and instrument GPU/CPU/system stacks to validate model assumptions; run disciplined studies with baselines parameter sweeps and reproducible documentation

What You Bring

Cross-domain reasoning. The core requirement. You connect AI workload behavior memory hierarchy (including DRAM and Flash tiers) connectivity/fabric and storage/IO into coherent quantified arguments evaluating a broad solution space rather than advocating for any single technology.
Proven impact. 12 years in system architecture performance engineering or infrastructure modeling with a track record of studies that influenced product direction investment decisions or platform strategy.
AI infrastructure fluency. Working knowledge of training and inference bottlenecks data movement patterns and memory pressure across transformers MoE and recommendation workloads. Engineering literacy required; researcher depth is not.
Memory and storage grounding. Solid understanding of memory hierarchy and tiering principles across DRAM and Flash; storage/IO fundamentals including tail latency QoS and NVMe/NVMe-oF behavior; and connectivity/fabric options for shared pooled and disaggregated memory.
Analytical rigor. Credible quantitative modeling clean experimental methodology and the ability to defend assumptions under scrutiny from both hardware and software engineers.
Communication that moves decisions. Converts complex multi-domain analysis into clear recommendations for engineering and executive audiences written and verbal.
Hands-on experience with GPU-side shared memory architectures DRAM/Flash tiering for AI workloads or fabric-attached memory pooling/disaggregation.
Familiarity with NVLink-class fabrics CXL-class pooling or emerging interconnect standards (UALink/UEth-class).
Prior ownership of benchmarking strategy for memory-intensive or storage-tiered AI workloads.
Familiarity with inference caching KV-cache management or Flash-backed serving at scale.
Experience with discrete-event or trace-driven system simulation.
Youre inclusive adapting your style to the situation and diverse global norms of our people.
An avid learner you approach challenges with curiosity and resilience seeking data to help build understanding.
Youre collaborative building relationships humbly offering support and openly welcoming approaches.
Innovative and creative you proactively explore new ideas and adapt quickly to change.

#LI-VL1

Base Pay Range

$219000 - $351000 USD

Equal Opportunity Employment Policy

Recruiting Agency Policy

Applicant AI Use Policy

Applicant Privacy Policy
Experience:

Staff IC

Apply Now

About Company

Samsung Semiconductor

The world runs on you.

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Principal Architect, Memory-Centric Computing AI Infrastructure

San Jose, CA - USA

Job Summary

Architecture Strategy & Trade Studies

Workload-Driven Analysis

Memory Hierarchy & Tiered Design

Connectivity & Pooling Approaches

Hands-On Modeling & Validation

Architecture Strategy & Trade Studies

Workload-Driven Analysis

Memory Hierarchy & Tiered Design

Connectivity & Pooling Approaches

Hands-On Modeling & Validation

About Company

Related Jobs