*Hiring location: Beijing Shanghai Guangzhou Shenzhen Hong Kong(visa sponsorship provided)
Would you like to join one of the fastest-growing teams within Amazon Web Services (AWS) and help shape the future of GPU optimization and high-performance computing Join us in helping customers across all industries to maximize the performance and efficiency of their GPU workloads on AWS while pioneering innovative optimization solutions.
As a Senior Technical Account Manager (Sr. TAM) specializing in GPU Optimization in AWS Enterprise Support you will play a crucial role in two key missions: guiding customers GPU acceleration initiatives across AWSs comprehensive compute portfolio and spearheading the development of optimization strategies that revolutionize customer workload performance.
Key Job Responsibilities
- Build and maintain long-term technical relationships with enterprise customers focusing on GPU performance optimization and resource allocation efficiency on AWS cloud or similar cloud services.
- Analyze customers current architecture models data pipelines and deployment patterns; create a GPU bottleneck map and measurable KPIs (e.g. GPU utilization throughput P95/P99 latency cost per unit).
- Design and optimize GPU resource usage on EC2/EKS/SageMaker or equivalent cloud compute container and ML services; implement node pool tiering Karpenter/Cluster Autoscaler tuning auto scaling and cost governance (Savings Plans/RI/Spot/ODCR or equivalent).
- Drive GPU partitioning and multi-tenant resource sharing strategies to reduce idle resources and increase overall cluster utilization.
- Guide customers in PyTorch/TensorFlow performance tuning (DataLoader optimization mixed precision gradient accumulation operator fusion ) and inference acceleration (ONNX TensorRT CUDA Graphs model compression).
- Build GPU observability and monitoring systems (nvidia-smi CloudWatch or equivalent monitoring tools profilers distributed communication metrics) to align capacity planning with SLOs.
- Ensure compatibility across GPU drivers CUDA container runtimes and frameworks; standardize change management and rollback processes.
- Collaborate with cloud provider internal teams and external partners (NVIDIA ISVs) to resolve cross-domain complex issues and deliver repeatable optimization solutions.
About the team
AWS Global Services includes experts from across AWS who help our customers design build operate and secure their cloud environments. Customers innovate with AWS Professional Services upskill with AWS Training and Certification optimize with AWS Support and Managed Services and meet objectives with AWS Security Assurance Services. Our expertise and emerging technologies include AWS Partners AWS Sovereign Cloud AWS International Product and the Generative AI Innovation Center. Youll join a diverse team of technical experts in dozens of countries who help customers achieve more with the AWS cloud.
Why AWS
Amazon Web Services (AWS) is the worlds most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating thats why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.
Diverse Experiences
AWS values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description we encourage candidates to apply. If your career is just starting hasnt followed a traditional path or includes alternative experiences dont let it stop you from applying.
Inclusive Team Culture
AWS values curiosity and connection. Our employee-led and company-sponsored affinity groups promote inclusion and empower our people to take pride in what makes us unique. Our inclusion events foster stronger more collaborative teams. Our continual innovation is fueled by the bold ideas fresh perspectives and passionate voices our teams bring to everything we do.
Mentorship & Career Growth
Were continuously raising our performance bar as we strive to become Earths Best Employer. Thats why youll find endless knowledge-sharing mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
Work/Life Balance
We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home theres nothing we cant achieve in the cloud.
- 5 years in cloud technical support solutions architecture or customer success management with at least 3 years of hands-on experience in GPU/accelerated computing platforms.
- In-depth understanding of GPU instance families (e.g. AWS G/P/H series) or similar offerings from other cloud providers AMI/driver/CUDA/container compatibility management and cloud storage/network performance tuning (e.g. S3 I/O EBS/Instance Store equivalents preprocessing pipelines). Proficient in scheduling GPU workloads with EKS or equivalent Kubernetes-based orchestration services including node pool tiering resource quotas elastic scaling and auto-recovery strategies. Experienced in multi-GPU/multi-node distributed computing (NCCL topology awareness tensor parallelism pipeline parallelism) with expertise in communication optimization for large-scale AI training and inference.
- Skilled in PyTorch/TensorFlow performance analysis and optimization including DataLoader tuning mixed precision operator fusion and inference acceleration toolchains (ONNX TensorRT CUDA Graphs).
- Experienced in cost and capacity governance familiar with Savings Plans RI ODCR Spot Capacity Blocks and right-sizing strategies or their equivalents in other cloud platforms.
- Demonstrated cross-functional communication and influence skills capable of driving technical solutions with data and business objectives.
- AWS Solutions Architect Professional Machine Learning Specialty or DevOps Professional certification or equivalent credentials from other cloud providers.
- Hands-on experience with NVIDIA ecosystem software and toolchains (CUDA/cuDNN/NCCL TensorRT CUDA Graphs) and proven ability to maintain performance consistency across versions and platforms.
- Delivered quantifiable performance improvements (GPU throughput latency reduction cost savings) with demonstrated benchmarking and regression testing methodology.
- Proven repeatable optimization results in LLM inference batch AI training real-time video processing or high-performance computing (HPC).
- Contributions to open source projects (Run:ai Ray vLLM DeepSpeed Kubeflow etc.) or published technical articles whitepapers or performance benchmarking.
- Experience with Infrastructure as Code (Terraform AWS CDK **or equivalent cloud development frameworks**) Helm Charts baseline container image management and DevOps automation.
- Able to present performance-business tradeoffs and results to senior stakeholders using PR/FAQ documents architecture diagrams and capacity/cost reports.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process including support for the interview or onboarding process please visit
for more information. If the country/region youre applying in isnt listed please contact your Recruiting Partner.