From 23139309e994edd67096b931df88119caa1e86b0 Mon Sep 17 00:00:00 2001 From: Lifu Huang Date: Sun, 10 Aug 2025 19:40:56 -0700 Subject: [PATCH] Fix incorrect K dim in CuTe MMA Atom doc. (#2544) --- media/docs/cpp/cute/0t_mma_atom.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/media/docs/cpp/cute/0t_mma_atom.md b/media/docs/cpp/cute/0t_mma_atom.md index ab57c92e3..4bb1c84b3 100644 --- a/media/docs/cpp/cute/0t_mma_atom.md +++ b/media/docs/cpp/cute/0t_mma_atom.md @@ -426,7 +426,7 @@ where we see 16 copies of the 64x8 tile. GMMA atoms that consume A and B sources directly from shared memory are a bit interesting. The GMMA Descriptor is constructed on an entire tile of A and/or B data in shared memory rather than being partitioned by threads. That is, every thread sees the entire tile of data and the tile is not reordered so that the descriptor can be constructed on it. In `ALayout` form, this can be expressed ```cpp -// (T128,V64x8) -> (M64,K16) +// (T128,V64x16) -> (M64,K16) using ALayout = Layout>, Stride< _0, Stride< _1,_64>>>; ```