msaffari-amd
403d99124d
[AITERKER-112] Add PER_TOKEN_HEAD FP8 quant scheme to batch_prefill
- New BlockAttentionQuantScaleEnum::PER_TOKEN_HEAD enum value
- Pipeline overload in block_fmha_batch_prefill_pipeline_qr_ks_vs_async
applying per-token Q/K descale via GEMM0-post outer product and
per-head V descale at epilogue
- fmha_batch_prefill_kernel kargs + MakeKargs + pipeline dispatch
- fmha_fwd.hpp host-side traits/args wiring
- quant.hpp trait specialization
- Codegen emits PER_TOKEN_HEAD kernel variants
2026-05-19 15:41:32 +00:00
..
2025-11-26 11:00:05 -07:00
2025-12-02 13:30:27 +01:00
2025-11-26 11:00:05 -07:00
2026-04-14 17:51:20 +00:00
2026-04-14 07:45:14 +00:00
2026-04-14 20:23:26 +00:00
2026-04-10 15:18:02 +00:00
2026-05-19 15:41:32 +00:00
2026-04-23 22:45:32 +00:00
2026-04-30 18:33:36 +00:00
2026-03-27 20:37:23 +00:00
2026-04-07 14:38:07 +00:00
2026-03-31 08:03:41 +00:00
2025-11-26 11:00:05 -07:00
2025-11-26 11:00:05 -07:00
2026-04-10 15:18:02 +00:00
2025-11-26 11:00:05 -07:00
2026-01-13 09:21:29 -08:00
2026-04-10 15:18:02 +00:00
2026-01-30 10:52:19 +08:00
2026-04-30 18:33:36 +00:00
2025-11-26 11:00:05 -07:00
2026-02-11 05:52:42 +00:00
2026-01-31 00:59:47 +08:00
2025-11-26 11:00:05 -07:00
2026-01-13 09:21:29 -08:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-04-14 20:23:26 +00:00
2026-03-02 12:21:44 +00:00
2026-04-24 16:31:59 +00:00
2026-03-02 12:21:44 +00:00
2026-03-12 08:27:49 +00:00
2026-03-16 08:31:56 +00:00
2026-03-16 08:31:56 +00:00
2026-03-12 08:27:49 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2025-11-26 11:00:05 -07:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-04-30 18:33:36 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00