Associate Director, Software Engineering (Model HostingInference Optimisation)
Job Summary
Some careers have more impact than others.
If youre looking for a career where you can make a real impression join HSBC and discover how valued youll be.
We are currently seeking an experienced professional to join our team in the role ofAssociate Director Software Engineering (Model Hosting/Inference Optimisation).
Business: CTO Platforms (AI Platforms)
Location: Shenzhen / Guangzhou
Req ID: 44990
Principal responsibilities
- Design build and operate scalable reliable model hosting platforms for LLMs embeddings and STT/TTS across heterogeneous hardware.
- Drive inference optimisation for latency throughput and cost (quantisation KV-cache optimisation dynamic/continuous batching).
- Evaluate integrate and tailor inference frameworks (e.g. vLLM TensorRT-LLM SGLang) to maximise performance on target hardware.
- Own inference health and performance monitoring: latency throughput TTFT memory availability; troubleshoot bottlenecks and deployment issues.
- Partner with hardware teams to apply hardware-specific optimisations and improve resource utilisation.
- Ensure hosting systems meet production standards for reliability scalability security and high availability.
- Build end-to-end scalable fine-tuning pipelines to adapt foundation models using domain datasets.
- Work with data scientists/domain experts to define objectives and metrics validate results and integrate fine-tuned models into the hosting/inference stack.
Requirements
- Bachelors/Masters/PhD in ML/NLP/CS/Data Science/Statistics (or related).
- 3 years on AI platforms covering both model hosting/inference optimisation and fine-tuning pipelines; LLM experience strongly preferred.
- Strong engineering skills in Python and CUDA with solid understanding of GPU/CPU architecture and HPC fundamentals.
- Deep inference expertise: KV-cache batching quantisation (INT4/FP8/GPTQ/AWQ) operator optimisation and framework integration (vLLM TensorRT-LLM SGLang); hands-on hosting on Docker/Kubernetes and AWS/GCP/Azure.
- End-to-end fine-tuning expertise: data prep distributed training hyperparameter tuning HF/Accelerate/LoRA/QLoRA; plus benchmarking/monitoring/troubleshooting AI-native mindset and effective use of coding assistants.
Youll achieve more when you join HSBC.
HSBC is an equal opportunity employer committed to building a culture where all employees are valued respected and opinions count. We take pride in providing a workplace that fosters continuous professional development flexible working and opportunities to grow within an inclusive and diverse environment. We encourage applications from all suitably qualified persons irrespective of but not limited to their gender or genetic information sexual orientation ethnicity religion social status medical care leave requirements political affiliation people with disabilities color national origin veteran status etc. We consider all applications based on merit and suitability to the role. /WX
Personal data held by the Bank relating to employment applications will be used in accordance with our Privacy Statement which is available on our website.
***Issued By HSBC Software Development (GuangDong) Limited***
Required Experience:
Director
About Company
HSBC Holdings plc is a British multinational investment bank and financial services holding company. It was the 7th largest bank in the world by 2018, and the largest in Europe, with total assets of US$2.558 trillion.