mirror of
https://github.com/NVIDIA/cutlass.git
synced 2026-05-11 08:50:09 +00:00
With the usual register allocation (producer 40, consumer 232) compiling Gemm with tile shape 256 x 208 (cooperative) or 128 x 208 (pingpong) show lots of register spilling (e.g. ~3000 bytes spill). For this case we can change the register allocation to producer 24, consumer 240, which avoids spills.