High-Performance Optimisation (Postdoctoral Researcher)

Zürich - Switzerland

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Huawei is a leading global information and communications technology (ICT) solutions provider. Through our constant dedication to customer-centric innovation and strong partnerships we have established leading end-to-end capabilities and strengths across the carrier networks enterprise consumer and cloud computing fields. Our products and solutions have been deployed in over 170 countries serving more than one third of the worlds population.

For the Computing Systems Laboratory we are hiring Three Postdoctoral Researchers in scope of an optimization-related high-performance computing the coming 1-2 years we exclusively aim to solve foundational scientific problems related to high performance distributed- and shared-memory parallel optimization with the goal to produce publications at top scientific magazines. On the longer term the platform aims to solve industrial optimization problems either on-premises or as-a-Service.

This research is conducted jointly with a team of leading scientists at Huaweis Theory Lab in Hong Kong with whom successful candidates are expected to work closely. Researchers in Zurich are expected to work on basic operators that provide functionalities at a level similar to BLAS SparseBLAS GraphBLAS LAPACK LAGraph etc. focusing on basic foundational operators for optimization workloads.

The successful candidate will work on

Identifying both existing and novel basic operations relevant to optimization platforms;
Speed-of-light analyses of both existing and newly identified basic operations that can a)identify fundamental performance bottlenecks b) accurately predict scalability properties (e.g. iso-efficiency) c) predict trade-off effects (e.g. memory vs. communication) and d) predict what combination of devices (classic CPUs or specialized accelerators and how many) would lead to the highest efficiency solves;
The design and prototyping of highly scalable highly efficient and highly productive software systems that lie at the foundation of our next-generation optimization platform.

Responsibilities

As such successful candidates will:

Design and implement novel basic operators required for our optimization platform;
Analyze specific algorithms for basic operators and establish fundamental limits in models of parallel computation that account not only for classic work (flops) and compute power (flop/s) but also account for data reuse and memory throughput & access latencies;
Following as appropriate cache-aware or cache-oblivious paradigms as well as standard HPC paradigms for shared- and distributed-memory parallelization vectorization etc.;
Research novel data structures to speed up basic operator execution on traditional CPUs with vector and matrix SIMD as well as less traditional xPUs such as AI accelerators;
Ensure solvers may be easily expressed as data-centric C control flow around calls to basic operators that automatically dispatch the solver over potentially multiple xPUs;
Use and if necessary extend run-time systems and communication layers to achieve higher basic operator efficiency better scalability and automate computational trade-offs;
Ensure the quality and performance of all solvers implemented on top of our basic operators enabling the solution of next-generation scientific and industrial problems.

Requirements

Successful candidates will have in-depth experience with several of the following:

Optimization of irregular algorithms such as graph computations or sparse numerical linear algebra touching on all of high-level data structures and algorithms to low-level code optimisations such as SIMD coarse- and fine-grained locking mechanisms;
Multi-core many-core programming (e.g. POSIX Threads or OpenMP);
Distributed-memory programming (e.g. MPI BSP or LPF) both using collective communications as well as raw RDMA;
Experience with code generation for high-performance computations and/or in-depth knowledge of their underlying methodologies (e.g. ALP BLIS DaCE Spiral Flame Firedrake et cetera).

Successful candidates master the following common aspects:

Generic programming in C11 (or higher) with strong knowledge of standard algorithms and data structures as found in the STL and beyond;
Performance analysis and parallel debugging (e.g. Valgrind GNU Debugger CI testing);
Excellent written and verbal communication skills with a proven ability to present complex technical information clearly and concisely to a variety of audiences;
Track record of publications at top HPC or applied math conferences or journals;
Collaborative work style with the ability to work in a multicultural environment.

The following additional experiences and in-depth knowledge would be considered a plus:

GraphBLAS or Algebraic Programming (ALP);
Any aspect of optimization or their key solvers;
State-of-the-art fabrics and their programming (e.g. Infiniband & ibverbs);
Publications at top venues in physical sciences or theoretical computer science; and
SIMT or accelerator programming (e.g. CUDA OpenCL)-- in particular with Huawei Ascend.