root
6b60a10cc0
for gfx350 kernel debug
2025-08-29 03:39:09 +00:00
Wen.Yang
160788cdf4
update benchmark script
2025-05-29 15:17:41 +08:00
Wen.Yang
efaa64e492
update bwd v3 kernels for timing error fixed
2025-05-29 15:11:16 +08:00
slippedJim
e4f0c4a549
[WIP] enable hd128 swa ( #2137 )
...
* enable hd128 swa
2025-05-08 21:00:37 +08:00
slippedJim
d0028193fc
[WIP] update hd64 group hex & explicit hd128 in kernel name ( #2118 )
...
* remove all pssk kernels
* update: update pssk kernels
* update
* update
* fix kernel name
* fix: update initial version hex
* add shell scripts of smoke test and benchmark test for group mode
* fix typo
---------
Co-authored-by: Wen.Yang <Wen.Yang@example.com >
2025-04-24 10:16:13 +08:00
slippedJim
2710b61b78
enable causal/no_causal case ( #2111 )
2025-04-22 09:11:27 +08:00
wen-des
a115ab14ed
group mode attetnion for generic 64<hd<=128 ( #2092 )
...
* group mode attetnion for generic 64<hd<=128
* fix loop_idx overridden before kvBase in causal mode
* align format
---------
Co-authored-by: Wen.Yang <Wen.Yang@example.com >
Co-authored-by: danyao12 <danyao12@amd.com >
2025-04-20 11:11:11 +08:00
danyao12
20a250440a
atomic_add exec
2025-04-14 17:01:08 +08:00
danyao12
cf964e9982
fix group seqlen_k >> seqlen_q/causal bug
2025-04-12 12:10:21 +08:00
danyao12
9b7c18bec5
fix seqlen_k >> seqlen_q/causal bug
2025-04-12 10:31:04 +08:00
danyao12
03b75b3f97
hd192 causal temp version
2025-04-11 15:22:32 +08:00
danyao12
4c07813b46
hd192 bf16 rtne&rtz
2025-04-08 15:21:13 +08:00
danyao12
7949a5adf8
hd192 temp version
2025-04-05 14:50:43 +08:00
danyao12
ab5f40cfaf
enable SBHD
2025-03-24 17:58:01 +08:00
danyao12
bbb526b9d2
support for more layouts
2025-03-24 15:10:33 +08:00
danyao12
5e0ebbe8b6
fix hd64 group mode causal api
2025-03-23 12:43:25 +08:00
danyao12
917f7e55f8
Merge branch 'develop' into ck_tile/fa_bwd_v3
2025-03-22 20:38:04 +08:00
danyao12
6758b77802
fix random memory access fault
2025-03-22 20:03:28 +08:00
carlushuang
6c08c5c46d
add mask support in hdim=192/128 ( #1999 )
2025-03-21 18:28:43 +08:00
wen-des
8a25aa2669
support group mode for hd=64 of fa bwd v3 ( #1990 )
...
* support group mode for hd=64 of fa bwd v3
* bugfixed for causal mask kernels when using kernel balence
* tiny align
---------
Co-authored-by: Wen.Yang <Wen.Yang@example.com >
Co-authored-by: danyao12 <danyao12@amd.com >
2025-03-21 11:42:58 +08:00
carlushuang
e3c9886cdf
[CK_TILE] return value with macro in ck_tile::kernel_launch API ( #1982 )
...
* return value with macro and revert the return value
* [CK-TILE] no-macro launch api solution (#1992 )
* no-macro solution
* address -Wcomma
---------
Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com >
2025-03-20 11:00:29 -07:00
rocking
b819c217e4
Sync the kname with instance name ( #1989 )
...
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
2025-03-20 00:06:45 +08:00
danyao12
e80ff1acbb
tiny fix
2025-03-18 12:08:34 +08:00
carlushuang
3e81279d26
Reapply "[CK_TILE] support hdim=192/128 pair for deepseekv3 ( #1961 )" … ( #1971 )
...
* Reapply "[CK_TILE] support hdim=192/128 pair for deepseekv3 (#1961 )" (#1969 )
This reverts commit 8cbcd3e0d0 .
* fix codegen problem
* Update config.hpp
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2025-03-13 11:41:39 +08:00
Illia Silin
8cbcd3e0d0
Revert "[CK_TILE] support hdim=192/128 pair for deepseekv3 ( #1961 )" ( #1969 )
...
This reverts commit 7a93b16ff6 .
2025-03-11 10:40:18 -07:00
carlushuang
7a93b16ff6
[CK_TILE] support hdim=192/128 pair for deepseekv3 ( #1961 )
...
* support hdim=192/128 pair
* remove useless print
* update
2025-03-11 21:07:40 +08:00
wenchenvincent
489602f9a8
Enabled bwd support for hdim_qk != hdim_v for TE integration. ( #1965 )
2025-03-11 11:35:27 +08:00
Max Podkorytov
9e132eb77c
refactor ck-tile kernel launch ( #1925 )
2025-03-07 08:29:40 -08:00
Illia Silin
9b51c08bf7
remove support for gfx940 and gfx941 targets ( #1944 )
...
* remove support for gfx940 and gfx941 targets
* update changelog
2025-03-05 11:07:33 -08:00
Ye Wang
151e999931
with receipt id=5, enable bias for te
2025-03-05 00:24:41 -06:00
danyao12
173ed584e0
Merge branch 'develop' into ck_tile/fa_bwd_v3
2025-03-03 19:17:27 +08:00
danyao12
06ffd0b3ca
add api limit for top-left causal mask
2025-03-03 14:00:34 +08:00
danyao12
60178ac970
hd64~128(x8) fp16/bf16 a32 w/cas_kb arbitrary seqlen_q/seqlen_k/strides
2025-02-28 15:17:27 +08:00
rocking
faa2235dad
explicit show no feature in kernel name ( #1920 )
2025-02-28 14:23:30 +08:00
slippedJim
a9bcd3c98d
make fmha bwd api template for v2 & v3 ( #1918 )
...
* use template fmha_bwd function
* update
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
2025-02-27 19:26:19 +08:00
rocking
e9ee568681
Apply filter to every kernel in the codgen of FMHA ( #1911 )
...
* add receipt for fwd
* Add receipt for bwd
* Use kernel name to avoid more receipt
* apply filter to every kernel
2025-02-26 20:20:29 +08:00
Dan Yao
32760acdbe
Merge pull request #1897 from ROCm/yewang12/ck_fav3_thread_local
...
make fmha_bwd_v3_kernel thread_local
2025-02-22 12:00:42 +08:00
danyao12
440a9d3bf7
Merge branch 'develop' into ck_tile/fa_bwd_v3
2025-02-20 16:01:26 +08:00
rocking
e4358c01d9
only output the deterministic bwd kernel for aiter ( #1903 )
...
* only output the deterministic kernel
* Add comment
2025-02-20 04:27:01 +08:00
danyao12
f1c0a36b9c
Merge branch 'develop' into ck_tile/fa_bwd_v3
2025-02-19 15:22:29 +08:00
rocking
f0d49d14fc
Add receipt 10~12 for codegen of aiter integration ( #1877 )
...
* Add receipt for aiter integration
* update receipt
* Add hdim 96 instances
* Revert "Add hdim 96 instances"
This reverts commit f339449f54 .
2025-02-19 09:01:08 +08:00
Ye Wang
216a6c2518
make fmha_bwd_v3_kernel thread_local
...
In Jax TE, multiple threads in the same process are spawn to train for each GPU. Therefore hipModueLoadData, hipGetFunction need to be run for each GPU in each corresponding threads.
2025-02-18 10:45:09 -06:00
Andres Lugo
8086bbe3a7
Add receipt 4 option to codegen ( #1875 )
...
* Add receipt 4 option to codegen
* Remove repeated code
* Review comments
2025-02-11 10:11:46 -08:00
danyao12
cf70a2efbb
rename
2025-02-08 15:44:33 +08:00
danyao12
dc3d35a9f1
fix hd72~120 memory fault
2025-02-08 13:46:39 +08:00
danyao12
f88ba67e11
smoke test update
2025-02-05 17:22:52 +08:00
danyao12
8a8dc7f6c3
add hd64 fp16 kernels
2025-02-05 16:24:21 +08:00
Andriy Roshchenko
35aebe5936
Add OCP FP8 support in CK_TILE ( #1829 )
...
* Add OCP FP8 to CK_TILE
* Validate OCP FP8 in FMHA FWD under VALID=1
2025-01-27 11:59:49 -07:00
danyao12
008c91c988
add layout restrictions
2025-01-27 10:57:14 +08:00
danyao12
92494a8a70
separate hd pad/unpad kernels
2025-01-24 13:55:02 +08:00