composable_kernel/include/ck_tile/ops at 01d37b171d3734edd45c94ab06bfdce21247f965 - composable_kernel - Public git mirror

ROCm/composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-14 18:17:44 +00:00

Files

History

assistant-librarian[bot] 9c0d4114ae [CK] Add FP8 KV_BLOCKSCALE support for batch prefill (#4263 )

Implement per-page K/V quantization for paged attention:
  - Add KV_BLOCKSCALE enum to BlockAttentionQuantScaleEnum
  - Use exp2 shift trick to eliminate explicit P scaling overhead
- Prefetch physical pages offset for KV cache, overlaps with
computations

## Proposed changes

Please describe the motivation behind the pull request, whether it
enables a new feature or fixes a bug. If there are associated pull
requests or issues, please link them to the pull request.

## Checklist

Please put an `x` into the boxes that apply. You can also fill these out
after creating the PR. If you're not sure, please don't hesitate to ask.

- [ ] I have added tests relevant to the introduced functionality, and
the unit tests are passing locally
- [ ] I have added the test to REGRESSION_TESTS list defined at the top
of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more
than 30 seconds to run.
- [ ] I have added inline documentation which enables the maintainers
with understanding the motivation
- [ ] I have removed the stale documentation which is no longer relevant
after this pull request
- [ ] (If this change is user-facing) I have added release notes which
provide the end users with a brief summary of the improvement from this
pull request
- [ ] I have run `clang-format` on all changed files
- [ ] Any dependent changes have been merged

## Discussion

If this is a relatively large or complex change, feel free to start a
discussion by explaining why you chose the solution you did and what
alternatives you considered



---
🔁 Imported from
[ROCm/composable_kernel#3696](https://github.com/ROCm/composable_kernel/pull/3696)
🧑‍💻 Originally authored by @Jeff-Huang

---------

Co-authored-by: Jeff Huang <chiachi.huang@amd.com>
Co-authored-by: Illia Silin <Illia.Silin@amd.com>

2026-02-04 18:25:31 -05:00

..

add_rmsnorm2d_rdquant

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

batched_contraction

[CK Tile] batched contraction kernel generalizing (#3126 )

2025-12-02 13:30:27 +01:00

batched_transpose

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

Mx fp6 flatmm (#3601 )

2026-02-02 16:04:40 +08:00

[CK_Tile] Support for a4w4 (fp4) in block scale gemm AB quant (#3603 )

2026-01-30 04:40:50 -07:00

fix mxfp8-gemm example failure (#3531 )

2026-01-13 10:26:45 +08:00

Mx fp6 flatmm (#3601 )

2026-02-02 16:04:40 +08:00

[CK] Add FP8 KV_BLOCKSCALE support for batch prefill (#4263 )

2026-02-04 18:25:31 -05:00

Shuffle fix for gfx950 (#3491 )

2026-01-13 09:21:29 -08:00

[Compiler] Addressing new compiler warnings (#3640 )

2026-02-02 09:39:48 -08:00

feat: add split_k support for block scale gemm bquant mode. (#3653 )

2026-02-02 14:41:53 -08:00

grouped_convolution

[CK_BUILDER] Add grouped conv fwd ck tile profiler (#3518 )

2026-01-19 22:29:01 -07:00

image_to_column

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

Shuffle fix for gfx950 (#3491 )

2026-01-13 09:21:29 -08:00

[CK Tile] multi reduce improvements (#3607 )

2026-01-27 12:56:09 -08:00

Fix redundant cast in model sensitive rmsnorm (#3681 )

2026-01-30 10:52:19 +08:00

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

Solve the CTAD regression & add up the Shell file for the docker management in testing (#3634 )

2026-01-26 10:29:28 -08:00

[CK_TILE][FMHA] Add sparse attention VSA (#3341 )

2026-01-31 00:59:47 +08:00

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

Shuffle fix for gfx950 (#3491 )

2026-01-13 09:21:29 -08:00

add_rmsnorm2d_rdquant.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

batched_contraction.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

batched_transpose.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

common.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

elementwise.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

epilogue.hpp

[CK_TILE] Epilogue chaining (Lwpck 3373) (#2773 )

2025-12-18 10:02:02 +01:00

flatmm.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

fmha.hpp

[FMHA] Batch Prefill Support Improvements: Change KV Cache Layout & Large Page Size Support (#3442 )

2026-01-05 18:41:47 +08:00

fused_moe.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

gemm_quant.hpp

[CK_TILE] add preshuffleB mode for ABQuant GEMM (#3495 )

2026-01-06 12:35:01 -08:00

gemm.hpp

Joye/revise wp pipeline (#3493 )

2026-01-05 13:49:26 -08:00

grouped_convolution.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

image_to_column.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

layernorm2d.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

moe_flatmm.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

norm_reduce.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

permute.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

pooling.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

reduce.hpp

Dlejeune/ck tile 2d multiple reductions (#3147 )

2026-01-09 11:16:37 +01:00

rmsnorm2d.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

smoothquant.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

softmax.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

sparse_attn.hpp

[CK_TILE][FMHA] Add sparse attention VSA (#3341 )

2026-01-31 00:59:47 +08:00

topk_softmax.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00

topk.hpp

feat(precommit-hooks): add check for correct copyright header (#3302 )

2025-12-10 22:50:43 -08:00