Global Memory Coalescing on CDNA

64 lanes of one wave emit independent byte addresses; the memory unit collapses them into the minimum number of 64 B HBM cache lines. Each column below is one cache line; each row is one lane. A coloured cell means "lane L's request fell inside cache line C". Number of non-empty columns = number of HBM transactions.

useful byte (lane wanted this part of the line) fetched but unused (same line, different lane's remainder) empty
lanes: 64 per-lane width: B useful: B unique cache lines: fetched: B efficiency: