Jeff Huang
e3556fed04
Optimize batch prefill kernel performance for VECTORIZED_LAYOUT KV cache ( #3657 )
...
- Add multi-dimensional page index support (YsGatherDims) in tile_scatter_gather
- Add is_gather_dim() and get_gather_index() for multi-dim page lookup
- Override MakeVDramTileDistribution() for VECTORIZED_LAYOUT to match
GEMM's BWarpDstrEncoding (K decomposition: {K2, K0, K1})
- Add GetGemmKDecomposition() to retrieve kABKLane and kKPerThread
- Add static_assert for RowMajor VLayout requirement in batch prefill
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
2026-01-29 07:18:41 +08:00
..
2025-11-26 11:00:05 -07:00
2025-12-02 13:30:27 +01:00
2025-11-26 11:00:05 -07:00
2025-12-14 14:49:49 -07:00
2026-01-13 09:21:29 -08:00
2026-01-13 10:26:45 +08:00
2026-01-26 10:29:28 -08:00
2026-01-29 07:18:41 +08:00
2026-01-13 09:21:29 -08:00
2026-01-27 23:46:49 -08:00
2026-01-27 23:46:49 -08:00
2026-01-19 22:29:01 -07:00
2025-11-26 11:00:05 -07:00
2025-11-26 11:00:05 -07:00
2025-11-26 11:00:05 -07:00
2025-11-26 11:00:05 -07:00
2026-01-13 09:21:29 -08:00
2026-01-27 12:56:09 -08:00
2025-11-26 11:00:05 -07:00
2025-11-26 11:00:05 -07:00
2026-01-26 10:29:28 -08:00
2025-11-26 11:00:05 -07:00
2026-01-13 09:21:29 -08:00
2025-12-10 22:50:43 -08:00
2025-12-10 22:50:43 -08:00
2025-12-10 22:50:43 -08:00
2025-12-10 22:50:43 -08:00
2025-12-10 22:50:43 -08:00
2025-12-18 10:02:02 +01:00
2025-12-10 22:50:43 -08:00
2026-01-05 18:41:47 +08:00
2025-12-10 22:50:43 -08:00
2026-01-06 12:35:01 -08:00
2026-01-05 13:49:26 -08:00
2025-12-10 22:50:43 -08:00
2025-12-10 22:50:43 -08:00
2025-12-10 22:50:43 -08:00
2025-11-26 11:00:05 -07:00
2025-12-10 22:50:43 -08:00
2025-12-10 22:50:43 -08:00
2025-12-10 22:50:43 -08:00
2026-01-09 11:16:37 +01:00
2025-12-10 22:50:43 -08:00
2025-12-10 22:50:43 -08:00
2025-12-10 22:50:43 -08:00
2025-12-10 22:50:43 -08:00
2025-12-10 22:50:43 -08:00