Accelerating Parallel Structures in DNNs via Parallel Fusion and Operator Co-Optimization

Zhanyuan Di, Leping Wang, Zhaojia Ma, En Shao, Jie Zhao, Ziyi Ren, Siyuan Feng, Dingwen Tao, Guangming Tan, Ninghui Sun

September 2025

Abstract

Parallel structures have become a key pattern in deep neural networks (DNNs), offering improved efficiency and scalability. However, existing machine learning compilers (MLCs) face challenges in optimizing these structures due to limited parallel fusion scope and insufficient analysis of intra-operator characteristics. This article introduces Magneto, a framework designed to accelerate DNN inference by co-optimizing parallel operators. Magneto broadens the fusion scope and incorporates a specialized co-tuning algorithm to optimize operators jointly. Our approach addresses the unique challenges inherent in optimizing parallel structures, enabling significant performance improvements across various hardware platforms. Experimental results show that Magneto outperforms state-of-the-art NVIDIA TensorRT and AMD MIGraphX, achieving geometric mean speedups of 2.27× and 2.88×, respectively.

Type

Journal article

Publication

ACM Transactions on Architecture and Code Optimization

Source Themes