Research Interests
The importance of research on AI systems capable of efficient training and inference is ever-increasing, as the recent AI models require larger computing resources and memory. The theme of our research is to design high-performance AI systems that can perform efficient training and inference on a wide range of target systems ranging from servers to embedded systems; in particular, heterogeneous systems with CPUs, GPUs, and NPUs. The design methodology is based on a comprehensive understanding of Transformer algorithms and various hardware system architectures including GPUs.
Currently, the lab is focusing on the following research topics.
On-device LLM Inference Efficient LLM inference on embedded GPUs or SoC with HW/SW codesign approach
Optimization techniques for model compression, quantization, and activation sparsity
Design Space Exploration (DSE) for the optimal LLM inference given an LLM and a HW system.
Neural Architecture Search (NAS) Automatic search for DNN structures that satisfy the given constraints
NAS for enhancing activation sparsity while maintaining accuracy
Deep learning inference on tiny systems such as micro-controllers
AI Accelerators
HLS-based AI accelerator design and prototyping
Performance estimation of AI accelerators and Design Space Exploration (DSE) based on it.
3D Parallel Deep Learning for LLM Design for efficient multi-GPU systems incorporating Pipeline Parallelism (PP), Tensor Parallelism (TP), and Data Parallelism (DP). Fast DSE of huge design space of heterogeneous GPU systems. |