Commit Graph

1143 Commits

Author SHA1 Message Date
Tianxing Wu
0d2a9badba fixed example 2025-10-23 11:17:46 +00:00
Juuso Korhonen
3c0e6d37bf fixing bugs 2025-10-23 09:47:30 +00:00
Juuso Korhonen
e144872308 change to BLOCK_M in shape definitions 2025-10-23 08:11:55 +00:00
Tianxing Wu
f72b994b00 More compilation fixes 2025-10-20 15:53:35 +00:00
Juuso Korhonen
d68a541c19 fixing compile errors... 2025-10-20 15:04:47 +00:00
Juuso Korhonen
97e7527eb1 fixing compile errors... 2025-10-20 14:03:15 +00:00
Tianxing Wu
9fda954253 Compiling fix 2025-10-20 13:16:19 +00:00
Tianxing Wu
995c6701d3 Merge branch 'tianxing/unified-attention' of https://github.com/ROCm/composable_kernel into tianxing/unified-attention 2025-10-17 09:05:12 +00:00
Tianxing Wu
af9167abad example 2025-10-17 09:05:10 +00:00
Juuso Korhonen
9940bd07f6 fix order in mask caller 2025-10-16 11:23:46 +00:00
Juuso Korhonen
072de3842f comment 2025-10-16 09:23:39 +00:00
Juuso Korhonen
aa4908ac14 fix mask 2025-10-16 09:18:38 +00:00
Juuso Korhonen
62932576c4 use correct mask in kernel 2025-10-16 09:02:08 +00:00
Juuso Korhonen
498a97aa1d merge 2025-10-16 08:57:14 +00:00
Juuso Korhonen
63c17b7236 correct masking by transforming y_idx = y_idx / num_queries_per_kv 2025-10-16 08:54:07 +00:00
Tianxing Wu
853fa21566 Example boostrap 2025-10-15 11:58:44 +00:00
Juuso Korhonen
72fe8b311c merge 2025-10-14 12:35:33 +00:00
Juuso Korhonen
4d232d59cc fix seq_len -> cur_batch_query_len 2025-10-14 12:34:33 +00:00
Tianxing Wu
b940a75328 Comments 2025-10-14 12:19:20 +00:00
Tianxing Wu
ec29289bb1 kv paging 2025-10-14 12:04:11 +00:00
Tianxing Wu
c87f2e3ca9 o window change 2025-10-14 09:59:47 +00:00
Tianxing Wu
96b208f6c7 Merge branch 'tianxing/unified-attention' of https://github.com/ROCm/composable_kernel into tianxing/unified-attention 2025-10-14 09:58:30 +00:00
Tianxing Wu
e1120fffb0 pipeline api 2025-10-14 09:58:27 +00:00
Juuso Korhonen
c3d27abfb8 fix q window 2025-10-14 09:49:54 +00:00
Juuso Korhonen
b37c356090 fix q window origin 2025-10-14 09:36:28 +00:00
Tianxing Wu
6a7fa959b7 kv tensor view and initial window 2025-10-13 12:53:43 +00:00
Tianxing Wu
cd354286c1 Merge branch 'tianxing/unified-attention' of https://github.com/ROCm/composable_kernel into tianxing/unified-attention 2025-10-13 11:32:30 +00:00
Tianxing Wu
be58d51d36 o ptr and window 2025-10-13 11:32:28 +00:00
Juuso Korhonen
6ba25b7e84 add commenting 2025-10-13 10:34:55 +00:00
Juuso Korhonen
81a02ffb40 Merge branch 'tianxing/unified-attention' of https://github.com/ROCm/composable_kernel into tianxing/unified-attention 2025-10-13 10:30:22 +00:00
Juuso Korhonen
b721f79f99 fix 2025-10-13 10:30:11 +00:00
Tianxing Wu
16129a794a stride fix 2025-10-13 10:30:08 +00:00
Tianxing Wu
96fde33ec4 Merge branch 'tianxing/unified-attention' of https://github.com/ROCm/composable_kernel into tianxing/unified-attention 2025-10-13 10:29:07 +00:00
Tianxing Wu
55fc6d7151 kv tensor view 2025-10-13 10:28:02 +00:00
Juuso Korhonen
af94aaf1cb refactor the q tensor view transformation 2025-10-13 10:22:52 +00:00
Juuso Korhonen
49ce980c67 Merge branch 'tianxing/unified-attention' of https://github.com/ROCm/composable_kernel into tianxing/unified-attention 2025-10-13 10:21:27 +00:00
Juuso Korhonen
2d6dab29eb refactor the q tensor view transformation 2025-10-13 10:18:23 +00:00
Tianxing Wu
36a65b1968 refactor 2025-10-13 10:05:23 +00:00
Tianxing Wu
bc6385f389 Some refactor 2025-10-13 10:01:38 +00:00
Tianxing Wu
1f4648dab5 refactor. and fixed q transformation 2025-10-10 15:27:36 +00:00
Tianxing Wu
df60493219 refactor 2025-10-10 13:25:19 +00:00
Juuso Korhonen
436eb3a4f8 transform q tensor view 2025-10-10 12:08:16 +00:00
Tianxing Wu
191f179038 unified attention rename 2025-10-09 08:47:19 +00:00
Tianxing Wu
e54cb5a713 intial commit 2025-10-06 13:02:38 +00:00
Sami Remes
ef43078788 Use __builtin_amdgcn_readfirstlane for buffer resource in fused_moe (#2893)
* Use __builtin_amdgcn_readfirstlane for buffer resource in fused_moe

* also do the same for amd_buffer_addressing_builtins.hpp

* merge with develop

* fix clang format

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
2025-09-30 15:12:30 -07:00
joyeamd
b60af5bde9 [CK_TILE]enhance elementwise test (#2683)
* enhance elementwise

* fix ci issues
2025-09-30 08:29:37 -07:00
Aviral Goel
bebf0e9d15 Extend Grouped GEMM with MultiD (Single & Double Shared Memory) feature to use persistent kernel option (#2933)
* feat(grouped_gemm_multi_d): add new example that integrates grouped_gemm and multi_d_gemm feature

* refactor: grouped_gemm_multi_d relies on grouped_gemm_kernel

* tests(grouped_gemm): grouped_gemm test suite passes with minor adjustments

* fix: segfault fix by passing correct parameters for d tensors

* style: clang format

* WIP: host code for grouped_gemm_multi_d persistent kernel compiles but segfaults

* feat(grouped_gemm_multi_d): add functionality to run persistant kernel

* feat(grouped_gemm_multi_d): add new example that integrates grouped_gemm and multi_d_gemm feature

* refactor: grouped_gemm_multi_d relies on grouped_gemm_kernel

* tests(grouped_gemm): grouped_gemm test suite passes with minor adjustments

* fix: segfault fix by passing correct parameters for d tensors

* style: clang format

* fix: incorrect validation method and Dtensor layout in test suite

* docs: improved README text based on review comments

* fix: parameterize NumDTensor in GroupedGemmHostArgs and remove lint
2025-09-29 15:03:56 -07:00
Khushbu Agarwal
81458a6681 Weight Preshuffle Block Scale gemm support (#2877)
* initial commit

* remove extra files

* fixing errors

* updated ReadMe file for mapping of diff quants with diff configs

* addressing review comments

* addressing review comments

* Resolved merge conflicts

* [CK TILE GEMM] Replace get_preshuffle_or with is_quantpreshuffle_enabled

The get_preshuffle_or was not working as expected, which led to incorrect behavior
in the quantization preshuffle process. This change replaces it with the more reliable
is_quantpreshuffle_enabled function to properly determine when preshuffle should be applied.

* initial commit

* debugging

* working fp8 for init constant

* fp8 working with all inits

* updated block level code with comments

* changing the loop iter

* debugging

* debugging

* debugging

* code fix

* code clean up

* clang formatted

* Add comment

* code cleanup

* clang formatted

* merge conflicts fixes

* applying the latest int4 changes to the piepline

* fixing test code for updated traits

* Adding gtest

* review comments addressed

* addressing review comments

* remove c++20 code

* added flush cache changes

---------

Co-authored-by: Cong Ma <congma13@amd.com>
Co-authored-by: root <root@banff-cyxtera-s73-2.ctr.dcgpu>
2025-09-29 12:46:37 -07:00
carlushuang
2e9428eb63 hot fix check eid range (#2924)
* hot fix check eid range

* fix clang format

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
2025-09-29 09:38:38 -07:00
yinglu
0f04f020d9 fix:tf32:fix build fail for all supported targets (#2942)
* fix:tf32:fix build fail for all supported targets

* new fix code
2025-09-29 08:04:11 -07:00