Tianxing Wu
73aed1b57c
remove if statements
2025-12-11 09:21:55 +00:00
Juuso Korhonen
345758971e
fix
2025-12-03 13:36:48 +00:00
Juuso Korhonen
7078de91d8
adding PAGE_BLOCK_SIZE >= BLOCK_SIZE optionality, now it regresses perf when it should improve?
2025-12-03 13:08:29 +00:00
Tianxing Wu
07a0dcd688
Merge branch 'develop' into tianxing/unified-attention
2025-12-02 10:58:31 +00:00
Yi DING
f211156ce6
[CK_Tile] Flatmm MX Cleanup & Explicite Offset Calculation ( #3286 )
2025-12-02 14:21:12 +08:00
Cong Ma
23fb253c4e
Make CK TILE GEMM Aquant support block tile 128x128x128 ( #3325 )
...
* [CK TILE GEMM Quant] Rename GemmConfigBQuantPrefill to GemmConfigQuantPrefill in examples
* [CK TILE GEMM Quant] update tile distribution of aquant
* [CK TILE GEMM Quant] update aquant register offset calculation
* [CK TILE GEMM Quant] Reimplement aquant register offset calculation
* [CK TILE GEMM Quant] Add more unit tests of Aquant
- Test M128xN128xK128
* [CK TILE GEMM Quant] Add more comments to Gemm Aquant
2025-12-01 15:04:37 -08:00
Tianxing Wu
6ecbd7c831
Merge branch 'develop' into tianxing/unified-attention
2025-12-01 11:03:33 +00:00
Aviral Goel
004784ef98
chore(copyright) update library wide CMakeLists.txt copyright header template ( #3313 )
...
* chore(copyright) update library wide CMakeLists.txt files copyright header template
* Fix build
---------
Co-authored-by: Sami Remes <samremes@amd.com >
2025-11-28 13:49:54 -08:00
Sami Remes
f981554c39
[CK_TILE] Fix Quant GEMM build ( #3320 )
...
* Fix build
* Fix ck_tile example 38 & 40
---------
Co-authored-by: Yi DING <yi.ding@amd.com >
2025-11-28 20:33:53 +08:00
Tianxing Wu
8d25f267ad
Merge branch 'tianxing/unified-attention' of https://github.com/ROCm/composable_kernel into tianxing/unified-attention
2025-11-28 12:04:31 +00:00
Cong Ma
30727c48fc
Tile engine for streamk ( #3157 )
...
* [CK TILE STREAMK] Introduce initial support for tile engine in streamk GEMM.
- This commit lays the groundwork for integrating the tile engine into streamk GEMM.
It focuses on creating benchmark executables for streamk GEMM.
- Additional scripts like test_benchmark.sh and gemm_benchmark.py will be added once
the streamk implementation reaches stability.
* [CK TILE STREAMK] Enable CI to execute tile engine benchmarks for StreamK GEMM
* [CK TILE STREAMK] Refactor: Extract common utility functions.
* [CK TILE STREAMK] Revise tile engine of streamk to align with the updated implementation
* Add pre-commit
* [CK TILE STREAMK] Add 'dp_persistent' and 'reduction_strategy' in output of CK TILE STREAMK
* [CK TILE STREAMK] Fix a bug about value of 'dp_persistent' of CK TILE STREAMK
* [CK TILE STREAMK] Update Jenkinsfile
* [CK TILE Engine] Update StreamK tile engine help message
Remove default value messages as they are automatically printed
* [CK TILE Engine] Update StreamK tile engine
- Remove namespace reboot
* [CK TILE Engine] Update StreamK tile engine
- Fix merge error
2025-11-27 15:49:57 -07:00
arai713
24d88d2472
[CK_TILE] Move DataTypeTraits into a Common File ( #3146 )
...
This renames the typeToStr struct in the common utilities to DataTypeTraits and removes all duplication of DataTypeTraits across files in CK Tile.
Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com >
2025-11-27 09:09:54 -08:00
Tianxing Wu
60ca9484b4
refined benchmarking
2025-11-27 15:07:03 +00:00
Tianxing Wu
eeb419845d
fmha v3 flops calculation
2025-11-27 10:32:28 +00:00
Tianxing Wu
c641d0d42c
non zero calculation fix
2025-11-27 09:24:52 +00:00
Tianxing Wu
6a2ac8f758
causal mask fix
2025-11-27 09:16:30 +00:00
Max Podkorytov
79aae7c7f7
[CK Tile] enable building examples by default ( #3259 )
...
* remove EXCLUDE_FROM_ALL from ck-tile examples
-> +15 min build time w/ 64 threads for a single arch
* fix cpp17 compile error in the ck-tile examples
---------
Co-authored-by: khuagarw <khuagarw@amd.com >
Co-authored-by: Ding, Yi <yi.ding@amd.com >
2025-11-26 16:24:44 -08:00
Aviral Goel
35a4b26af0
fix: add dynamic selection of pipelines for aquant mode ( #3282 )
...
- Add conditional selection to use v3 pipeline when PreshuffleQuant is true
- Add static assertion in memory pipeline to prevent PreshuffleQuant usage
- Restore BaseBQuantGemmPipelineAgBgCrCompV3 for BQuant cases
- Update BaseGemmPipeline selection to handle all quant modes properly
2025-11-26 10:58:09 +04:00
Aviral Goel
cd47293869
chore(copyright): update copyright header for experimental & example directory ( #3292 )
2025-11-26 03:09:39 +04:00
Bartłomiej Kocot
00dfa2f2ce
[CK TILE] Grouped Conv Explicit Gemm ( #3289 )
...
* [CK TILE] Grouped Conv Explicit Gemm
* fixes
* apply builder fixes
2025-11-25 23:28:35 +01:00
Khushbu Agarwal
37ea160088
[CK-Tile] fix block scale example for gfx1201 ( #3283 )
2025-11-25 13:10:28 -08:00
Bartłomiej Kocot
9ac2666d5b
[CK_BUILDER] Add grouped conv bwd ck tile traits ( #3281 )
...
* [CK_BUILDER] Add grouped conv bwd ck tile traits
* copilot fixes
2025-11-25 14:57:43 +01:00
Tianxing Wu
cc7caf4d7d
correct results
2025-11-25 09:27:40 +00:00
Aviral Goel
d85f065b15
chore(copyright): update copyright header for example directory ( #3273 )
...
* chore(copyright): update copyright header for codegen directory
* chore(copyright): update copyright header for example directory
2025-11-24 18:02:41 -08:00
rocking
229d43ea0c
Fix batch prefill compile fail in aiter ( #3279 )
...
* Fix batch prefill aiter compile fail
* Fix compile error
2025-11-25 09:46:32 +08:00
Thomas Ning
de6a9590ab
Reorganize of KPack in GEMM ( #3247 )
...
* add the reorganize of KPack
* fix the compilation error
* fix the compilation error
2025-11-24 12:38:59 -08:00
Khushbu Agarwal
8111572785
[CK_Tile] Support for preshuffle weight(B) quant tensor for block scale gemm ( #3165 )
...
* formatted
* formatted
* formatting
* formatting
* formatting
* [CK TILE GEMM] Refactor block_scale_gemm examples
- Split cpp file to reduce building time
- Support multiple GemmConfig
* [CK TILE GEMM] Refactor block_scale_gemm examples
- Update Readme
* enable prefill shapes
* [CK TILE GEMM] Refactor block_scale_gemm examples
- Add support for rowcol and tensor GEMM operations
* [CK TILE GEMM] Refactor block_scale_gemm examples
- Update README
* adding preshuffle quant as new parameter and its associated new files
* remove debugging statements
* adding test
* enable preshuffle quant with permuteN
* updating readme and correcponding gemmconfigs
* updating cmake file
* fixing CI failures for grouped quant gemm
* addressing review comments
* fixing CI issue
* addressing reveiw comments
* formatting
* formatting
* fixing aquant operator overlaoding
* formatting
---------
Co-authored-by: Cong Ma <congma13@amd.com >
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-11-24 07:48:42 -08:00
Tianxing Wu
b3c5cd0c76
Fixed the block_table
2025-11-24 15:32:33 +00:00
Juuso Korhonen
f2fbc44b7b
fix
2025-11-24 10:20:04 +00:00
rocking
5948dbffe4
Support fp8 dynamic quantization for fmha ( #3206 )
...
* Support qscale for dynamic quant, remove static quant
* Support hdim=256
* Remove bias test case for fp8
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: asleepzzz <hanwen.chang@amd.com >
2025-11-24 16:28:25 +08:00
Johannes Graner
096f0a3b23
[CK Tile] Fix example for conv fwd + bias + clamp ( #3235 )
...
* Fix clamp not being applied correctly
* Apply group offsets to D tensors
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2025-11-24 07:36:26 +01:00
Emily Martins
2e4b8a8fc4
[CK_TILE] Remove Old CK Tile Stream-K Artifacts ( #3202 )
...
* Remove old CK Tile Stream-K implementation
The original CK Stream-K implementation was based on old CK's Stream-K
block to C tile map. However, this implementation did not align with the
original Stream-K paper. Thus, we implemented a new tile partitioner and
associated Stream-K kernel, which was placed in the reboot namespace.
Now that the new Stream-K implementation is ready, this change removes
all artifacts of the old implementation. Specifically, the following
changes were made:
- Removes old Stream-K tile partitioner from CK Tile
- Removes the reboot namespace such that the new implementation resides
in the ck_tile namespace only.
- Adds tests for bf8 and fp8 using the new implementation
- Removes tests for the old implementation
- Remove the v2 suffix from the new CK Tile Tile Partitioner
derived classes.
- Updates Stream-K Kernel ops file to use /** commenting style.
* Remove v2 from tile partitioner validation function names
2025-11-20 09:32:32 -07:00
asleepzzz
5adaa201ed
Revert "Add attn sink ( #2892 )" ( #3250 )
...
This reverts commit 9fa4e8d5ab .
2025-11-20 07:55:15 -08:00
Tianxing Wu
f552cd7841
ref data copying
2025-11-20 11:34:39 +00:00
Linjun-AMD
9fa4e8d5ab
Add attn sink ( #2892 )
...
* enable attn sink
Signed-off-by: JL-underdog <Jun.Lin@amd.com >
* update attn_sink script
Signed-off-by: JL-underdog <Jun.Lin@amd.com >
* fix some error
Signed-off-by: JL-underdog <Jun.Lin@amd.com >
* clang-format
Signed-off-by: JL-underdog <Jun.Lin@amd.com >
* update fmha_bwd mask
Signed-off-by: JL-underdog <Jun.Lin@amd.com >
* update fmha_bwd_kernel'mask
Signed-off-by: JL-underdog <Jun.Lin@amd.com >
* update block_fmha_pipeline_qr_ks_vs.hpp
Signed-off-by: JL-underdog <Jun.Lin@amd.com >
* fix ci error
Signed-off-by: LJ-underdog <Jun.Lin@amd.com >
* fix format error
Signed-off-by: LJ-underdog <Jun.Lin@amd.com >
* Update block_fmha_bwd_pipeline_default_policy.hpp
* Update fmha_fwd_runner.hpp
* Update block_fmha_batch_prefill_pipeline_qr_ks_vs_async.hpp
* Update fmha_fwd_runner.hpp
* Update fmha_fwd_runner.hpp
* Update fmha_fwd_runner.hpp
* update splitkv_pipline
Signed-off-by: LJ-underdog <Jun.Lin@amd.com >
* update splitkv&pagedkv pipeline
Signed-off-by: LJ-underdog <Jun.Lin@amd.com >
* add sink test
Signed-off-by: LJ-underdog <Jun.Lin@amd.com >
* update attn_sink result log
Signed-off-by: LJ-underdog <Jun.Lin@amd.com >
* update smoke_test_fwd_sink.sh
Signed-off-by: LJ-underdog <Jun.Lin@amd.com >
* update test file
Signed-off-by: LJ-underdog <Jun.Lin@amd.com >
* update test script
Signed-off-by: LJ-underdog <Jun.Lin@amd.com >
* Update block_fmha_fwd_splitkv_pipeline_qr_ks_vs.hpp
* use constexpr kHasSink for sink in fmha pipeline
Signed-off-by: Linjun-AMD <Jun.Lin@amd.com >
* update by pre-commit
Signed-off-by: Linjun-AMD <Jun.Lin@amd.com >
* Update include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs.hpp
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs.hpp
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update include/ck_tile/ops/fmha/kernel/fmha_fwd_pagedkv_kernel.hpp
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fmha_fwd.py
* Update example/ck_tile/01_fmha/codegen/ops/fmha_fwd_splitkv.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update include/ck_tile/ops/fmha/pipeline/block_fmha_fwd_splitkv_pipeline_nwarp_sshuffle_qr_ks_vs.hpp
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Remove causal mask setting logic from mask.hpp
Removed the mask setting logic for causal masks.
* fix ci error that some usage of lamada not support in c++17
Signed-off-by: LJ-underdog <Jun.Lin@amd.com >
* Update remod.py
* add smoke sink test
Signed-off-by: LJ-underdog <Jun.Lin@amd.com >
* Update fmha_pagedkv_prefill.py
* Update FmhaFwdPipeline parameters in fmha_fwd.py
* update block_fmha_pipeline_qr_ks_vs_async_trload.hpp
Signed-off-by: LJ-underdog <Jun.Lin@amd.com >
* fix c++17 unsupprot error
Signed-off-by: LJ-underdog <Jun.Lin@amd.com >
* Update block_fmha_fwd_pagedkv_pipeline_qr_ks_vs.hpp
* Fix formatting of sink_seq_end assignment
* Fix indentation for sink_seq_end assignment
* Update block_fmha_fwd_pagedkv_pipeline_qr_ks_vs.hpp
---------
Signed-off-by: JL-underdog <Jun.Lin@amd.com >
Signed-off-by: LJ-underdog <Jun.Lin@amd.com >
Signed-off-by: Linjun-AMD <Jun.Lin@amd.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-11-20 19:24:05 +08:00
Yi DING
47e2ed838e
[CK_TILE] Add Flatmm MX FP8 ( #3208 )
...
* Use async for flatmm mxfp4
* Fix preshuffle
* Add flatmm mxfp8
* Thanks, Copilot
* Thanks Copilot again~
2025-11-20 10:35:15 +08:00
Yashvardhan Agarwal
1eb26460aa
[ck_tile] Pooling example - Improved tile sizes ( #3233 )
...
* improved tile sizes
- modified tile sizes for improved example performance
* Update example/ck_tile/36_pooling/pool3d.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2025-11-19 15:30:18 +01:00
Aviral Goel
ac70206b2c
feat: add support for bf16 for grouped_gemm & grouped_gemm_preshuffle… ( #3225 )
...
* feat: add support for bf16 for grouped_gemm & grouped_gemm_preshuffle kernel(s) along with unit test
* docs: Update CHANGELOG.MD
2025-11-18 09:32:27 -05:00
Tianxing Wu
de995fea71
Various fixes
2025-11-18 13:04:58 +00:00
Juuso Korhonen
6ef0b9da8c
fixing
2025-11-18 08:57:30 +00:00
Yi DING
b6720531de
[CK_TILE] MX Flatmm Split kernel instances ( #3207 )
...
* [CK_TILE] MX Flatmm Split kernel instances
* Fix flatmm example compile
2025-11-18 13:46:30 +08:00
Tianxing Wu
8f44fc9593
mem calculation fixed
2025-11-17 14:49:09 +00:00
Tianxing Wu
ff28bd21ba
flops and mem calculation
2025-11-17 13:55:54 +00:00
Tianxing Wu
d3c5faf47e
Assert block_size num_queries_per_kv
2025-11-17 12:40:31 +00:00
Tianxing Wu
9b68bbd425
Merge branch 'tianxing/unified-attention' of https://github.com/ROCm/composable_kernel into tianxing/unified-attention
2025-11-17 10:06:05 +00:00
Tianxing Wu
5e2fd848b9
remove unneeded args
2025-11-17 10:04:30 +00:00
Juuso Korhonen
5d2a9e5f16
deving the test...
2025-11-17 09:46:31 +00:00
Juuso Korhonen
5e43fd2dfc
refactor to clearer BLOCK Q logic
2025-11-17 08:27:19 +00:00
Juuso Korhonen
57a0ec8cc1
add handling for -1 k heads arg
2025-11-17 07:36:42 +00:00
Juuso Korhonen
4a13749f7f
fix to example
2025-11-17 07:33:10 +00:00