A Decoupled Analytical Model for Tile Size Selection in Affine Programs

Shihan Yuan, Zuoyan Zhang, Guanghui Song, Junhui Peng, Feng Wang, Zhuo Tang, Kenli Li, Jie Zhao

April 2026

Abstract

Existing tile size selection approaches are tightly coupled with compiler transformation pipelines, often leading to inaccurate modeling of cache behavior and limited effectiveness for non-rectangular tile shapes. This paper presents TileMind, a decoupled analytical model that combines compile-time and runtime information for tile size selection in affine programs. It introduces a transformation-aware pre-tiling step that enables the decoupled selector to remain consistent with compiler transformations while extracting compile-time metadata. The extracted metadata is then combined with profiled runtime characteristics to construct a richer yet tractable feasible domain, within which a nonlinear objective for tile size selection is formulated. This objective is subsequently transformed into a binary product linearization problem, with its nonlinear constraints also linearized for efficient optimization. Finally, an intra-tile optimization aligns computation with data layout to enhance data reuse within tiles. Across two multi-core Intel CPUs, TileMind achieves 1.49x (sequential) and 1.33x (parallel) mean speedups on twenty PolyBench kernels, and 2.08–3.54x speedups on three deep learning workloads over the state-of-the-art analytical model Pluto-tss. Compared with TVM’s latest autotuner MetaSchedule, TileMind delivers 1.35–1.46x mean speedups while reducing tuning overhead by 2–4 orders of magnitude. While demonstrating effectiveness on selecting tile sizes for non-rectangular tile shapes and compatibility with PPCG, Pluto, and TVM, we further provide proof-of-concept results on GPUs, illustrating the potential portability of TileMind across architectures.

Type

Journal article

Publication

ACM Transactions on Architecture and Code Optimization