Amir Ghamarian
ea157f6244
Route all prefill to 4-warp kBlockM=128 kernel
...
Exhaustive sweep over 363 production trace shapes shows the 4-warp
serial pipeline outperforms the 8-warp interleaved pipeline on every
single prefill shape (0 exceptions out of 71 prefill shapes).
The 4-warp kernel has better CU occupancy and the serial pipeline's
async prefetch is sufficient for these workloads.
Dispatch now: tiny (decode) -> small (short decode) -> medium (all prefill).
The 8-warp large tier is no longer used for d64 GQA-8.
Made-with: Cursor
2026-03-28 13:52:42 +00:00
..
2026-01-15 07:19:31 -08:00
2026-01-07 16:30:57 +01:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-12-18 07:59:45 +01:00
2025-11-28 13:49:54 -08:00
2025-10-16 03:10:57 -07:00
2026-01-07 16:30:57 +01:00
2026-01-07 16:30:57 +01:00
2026-01-15 16:43:02 +01:00
2026-01-13 07:14:23 +01:00
2026-01-07 16:30:57 +01:00
2025-12-18 07:59:45 +01:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-12-18 07:59:45 +01:00
2025-11-28 13:49:54 -08:00
2026-01-07 16:30:57 +01:00
2026-01-07 16:30:57 +01:00
2026-01-07 16:30:57 +01:00
2026-01-17 08:30:27 +01:00
2025-12-15 07:16:01 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2026-01-17 08:30:27 +01:00
2026-01-07 16:30:57 +01:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2026-01-07 16:30:57 +01:00
2025-11-28 13:49:54 -08:00
2026-01-07 16:30:57 +01:00
2026-01-07 16:30:57 +01:00
2026-01-07 16:30:57 +01:00
2025-12-30 16:25:08 +01:00
2025-11-28 13:49:54 -08:00
2026-01-07 16:30:57 +01:00
2025-11-28 13:49:54 -08:00
2026-01-07 16:30:57 +01:00
2025-11-28 13:49:54 -08:00
2026-01-07 16:30:57 +01:00
2026-01-07 16:30:57 +01:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-12-18 13:12:15 -07:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2026-01-15 16:43:02 +01:00
2025-11-28 13:49:54 -08:00
2026-01-07 16:30:57 +01:00
2026-01-15 16:43:02 +01:00
2026-01-15 16:43:02 +01:00
2026-03-28 13:52:42 +00:00
2026-01-14 07:31:45 -08:00
2024-12-04 00:46:47 +01:00