Magneto: Accelerating Parallel Structures in DNNs via Co-Optimization of Operators (poster)

Zhanyuan Di, Leping Wang, Ziyi Ren, En Shao, Jie Zhao, Siyuan Feng, Dingwen Tao, Guangming Tan, Ninghui Sun

March 2025

Abstract

Deep neural networks (DNNs) increasingly rely on parallel structures to enhance performance and efficiency. However, existing machine learning compilers (MLCs) face challenges in optimizing these structures due to limited parallel fusion scopes and insufficient consideration of intra-operator information. This paper introduces Magneto, a novel framework designed to accelerate parallel structures in DNNs through the co-optimization of parallel operators. By expanding the scope of parallel operator fusion and introducing a dedicated co-tuning algorithm, Magneto unlocks new opportunities for co-optimization. Experimental results demonstrate that Magneto outperforms NVIDIA TensorRT and AMD MIGraphX, achieving speedups of 3.02× and 4.19×, respectively.

Type

Conference paper

Publication

In Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming