Ville Pietilä
0ea3268d5d
Remove debug and other dead code.
2025-09-25 09:41:33 +00:00
Ville Pietilä
cc7433efc6
Add more comments, disable debug code.
2025-09-25 09:37:15 +00:00
Ville Pietilä
97f842f2c6
Fully functional LDS to global mem transfer using tensor descriptor and tile distribution encoding.
2025-09-25 09:30:50 +00:00
Ville Pietilä
625a78b17b
WIP: LDS to global mem transfer using CK tile tensor descriptor and tile distribution encoding.
2025-09-24 15:08:01 +00:00
Ville Pietilä
7280df1bc3
Add one more unit test for tensor view.
2025-09-24 12:10:26 +00:00
Ville Pietilä
8048d6ff73
Fix build.
2025-09-23 11:17:08 +00:00
Ville Pietilä
e6f6c4a6a3
Working baseline for depthwise covolution with merged conv groups.
2025-09-23 11:14:10 +00:00
Ville Pietilä
29e3112b9b
Epilogue fixes.
2025-09-22 15:38:02 +00:00
Ville Pietilä
d7da3d5089
Offset fixes.
2025-09-22 15:37:46 +00:00
Ville Pietilä
7dfbac5d0b
WIP: Separate epilogue for merged conv groups.
2025-09-19 13:52:33 +00:00
Ville Pietilä
af6838e5dc
Integration test for CShuffle epilogue.
2025-09-19 12:09:08 +00:00
Ville Pietilä
7f52f84167
Fix tile window size for c block.
2025-09-19 08:08:19 +00:00
Ville Pietilä
6bcdb0947e
LDS to global memory copy.
2025-09-18 14:59:32 +00:00
Ville Pietilä
0e09504057
WIP: merged conv groups GEMM epilogue changes.
2025-09-17 14:25:02 +00:00
Ville Pietilä
27a2ceb4f7
Increase the max number of reported errors.
2025-09-17 12:29:12 +00:00
Ville Pietilä
4ec81cb95c
Add more logging.
2025-09-17 12:27:51 +00:00
Ville Pietilä
6d318ab481
Enable running multiple conv groups per batch.
2025-09-12 14:03:04 +00:00
Ville Pietilä
0d5c1b9638
WIP: Merged conv groups epilogue.
2025-09-11 15:24:36 +00:00
Ville Pietilä
970b40aa6c
WIP: Merged conv groups offset calculation.
2025-09-09 11:33:31 +00:00
Ville Pietilä
d9f0a9cdd0
Fully working conv group merging for TransformConvBwdWeightToGemm.
2025-09-09 09:58:43 +00:00
Ville Pietilä
8845b23254
WIP: Tensor transformations.
2025-09-08 15:41:54 +00:00
Ville Pietilä
61b3c96273
Add number of groups to merge to ck tile grouped gemm example.
2025-09-04 14:24:23 +00:00
Ville Pietilä
2b1908a375
Fix compilation of the grouped conv examples.
2025-09-04 12:01:49 +00:00
arai713
0282d98412
[CK TILE] Stream-K tile partitioner ( #2708 )
...
* initial commit for skeleton code
* replaced skeleton code with old streamk b2c map functions from old CK, still need to clean up the code
* fixed up code to match CK Tile convention: data type changes, naming changes, etc.
* change for num_sk_blocks data type
* formatting fix
* minor fixes
* moved reduction argument to template
* resolved comments from PR review: standardizing naming, pruning unneeded code
* resolve errors from merge of device op PR: moved enum to common file
* switching to uint32_t due to implementation constraints: divmod only takes uint32_t and mixing signed and unsigned types causes problems
* unsigned type fix
* add const qualifier
* added documentation for template parameters
* documentation edit
2025-09-03 13:38:17 -07:00
msaffari-amd
47d020a993
refactor: use snake_case naming in ck_tile/core components ( #2766 )
2025-09-03 09:34:11 +02:00
Cong Ma
e1ab460d2d
[CK TILE GEMM] Fix building issues ( #2772 )
...
- Add `WarpGemmMfma_f32_16x16x128_[fp8|bf8]_[fp8|bf8]_CTransposed`
- Replace `__gfx950__` with `CK_GFX950_SUPPORT`
2025-09-02 22:40:18 -07:00
Po Yen Chen
9f35cde374
[CK_TILE] Fix fmha_fwd_v3() Default2DEpilogue usage ( #2765 )
...
* Fix Default2DEpilogue usage
* Fix Default2DEpilogue usage for batch_prefill
2025-09-02 09:51:56 -07:00
Sami Remes
4419fc34a2
Fix formatting problem ( #2768 )
2025-09-02 14:14:10 +03:00
Michael Mcminn
022f369deb
Adding fix for the gfx908 to the GEMM MFMA implementaitons of WarpGem… ( #2751 )
...
* Adding fix for the gfx908 to the GEMM MFMA implementaitons of WarpGemmMfmaBf16Bf16F32M4N64K16 WarpGemmMfmaBf16Bf16F32M64N4K16
* Adding support for offload target gfx9-4-generic
* This duplication here isn't ideal
2025-09-02 10:35:07 +02:00
Haocong WANG
33418b201f
Fix naming issue ( #2762 )
2025-09-02 11:18:53 +08:00
Po Yen Chen
d876e87fe4
[CK_TILE] Add FAv3 fwd pipeline ( #2731 )
...
* Add FAv3 fwd pipeline
* Unpack v_pk_mul to hide v_mov
* Avoid compiler moving l compute across phase
* Sync sched_group_barrier() setting for masking cases
2025-09-01 09:16:45 +08:00
Aviral Goel
fcff0043ae
chore(gemm): clang format to pass CI ( #2758 )
2025-08-29 00:38:46 -07:00
Vijay Krish
4208e28988
ck_tile kernel for gemm with groupwise quantized B tensor. ( #2663 )
...
* This change introduces new pipelines with Intrawave scheduler and block gemm primitives that loads the scale tensor to registers to perform dequantization post MFMA on C tensor in registers.
Scale tensor data, BQ is spliced across threads in registers and not stored in LDS.
Current support is for the following combinations, but it should be fairly straightforward to extend support to more formats.
fp8, fp8 -> f32
bf8, bf8 -> f32
fp8, i4 -> f32
bf8, i4 -> f32
Group size can go down to as low as K length of underlying WarpGemm primitive.
* Solve merge conflict
* [CK TILE] Update CHANGELOG.md
---------
Co-authored-by: Vijay Krishnamoorthy <vjkrish@fb.com >
Co-authored-by: ThomasNing <thomas.ning@amd.com >
Co-authored-by: Cong Ma <congma13@amd.com >
2025-08-28 23:43:02 -07:00
Cong Ma
428090f749
Support transposed C tile in Aquant ( #2679 )
...
The performance of Aquant has increased after enabling transposed C.
Do not need to exchange AQ elements among lanes after enabling
transposed C as one thread only holds data from one row.
2025-08-28 13:28:09 -07:00
Mateusz Ozga
0758883fa4
[CK-TILE] Default2DEpilogue, example and adding nullptr_t type for D ( #2752 )
...
* Init commit
* Quick fix, CI fails
* Remove CDElementWise
* Add CDEELementWise
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-08-28 12:45:50 -07:00
asleepzzz
038ea82315
Revert "[CK_TILE] FMHA BWD Enable Tile 16x192 ( #2741 )" ( #2757 )
...
This reverts commit ead4447b20 .
2025-08-28 22:50:42 +08:00
linqunAMD
4a49dac7c6
[Regression] Fix CK_TILE build error in grouped_convolution, copy_basic and fused_moegemm_kernel ( #2728 )
...
* fix copy basic build error
* fix other ck tile test build error
2025-08-28 20:30:30 +08:00
Yi DING
ead4447b20
[CK_TILE] FMHA BWD Enable Tile 16x192 ( #2741 )
...
* 16x192
* Use buffer_load_lds for lse/d
* Dispatch & cleanup
* Avoid zeroing dq & fix
* fix
2025-08-28 18:54:18 +08:00
Linjun-AMD
bf7b458e6e
use iglp to improve dim256 fmha fwd in qr_ks_vs pipeline ( #2711 )
...
* add k_lds padding and iglp to improve dim256 fmha fwd
* Update include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs.hpp
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* update block_fmha_pipeline_qr_ks_vs.hpp
Signed-off-by: JL-underdog <Jun.Lin@amd.com >
* Update block_fmha_pipeline_qx_ks_vs_custom_policy.hpp
* clang format
Signed-off-by: JL-underdog <Jun.Lin@amd.com >
* use same naming style
---------
Signed-off-by: JL-underdog <Jun.Lin@amd.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-08-28 11:39:39 +08:00
Aviral Goel
f5f795c4d6
feat(HostTensor): Extend support for HostTensor class' >> operator to print more data types ( #2691 )
...
* feat(check_err): add a variable to adjust number of incorrect values to print
* feat(host_tensor): add printing capability for fp8 bf8 int8 int4
* fix(gemm_utils): update acceptable data type
* fix(host_tensor): print both 4 bit ints in pk_int4_t
* refactor(HostTensor): define pk_int4_t_to_int8x2_t and fix typo in vector_type.hpp
* feat(host_tensor): add print first n elements functions
2025-08-27 18:17:24 -07:00
Cong Ma
cd53e2e57e
[CK TILE GEMM] Fix a merge conflict ( #2753 )
...
* Fixed a merge conflict in 245467f3
* Foramt the code
2025-08-27 11:08:09 -07:00
Cong Ma
245467f359
[CK TILE] Fix bugs in AQuant preshuffle ( #2700 )
...
* [CK TILE] Fix bugs in AQuant preshuffle
- Make Aquant works with block Mx64x256. `M` could be 16, 32, 64
- Make Aquant works with warp 16x16x32 and 32x32x16.
* [CK TILE] Rename Preshuffle to PreshuffleQuant
The new name, PreshuffleQuant, explicitly states the function's purpose:
to preshuffle the quantization matrix.
* [CK TILE Block Scale] Use GemmConfig to save tile properties
- Remove specialization of GemmQuantTypeConfig
- Pass GemmConfig around which contains tile properties. Stop using hard
coded tile properties in `gemm_calc_aquant()`
* [CK TILE Block Scale] Rename GemmConfig used in block scale
- Remove unused GemmConfig
- Rename GemmConfig used in block scale
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com >
2025-08-27 00:05:54 -07:00
John Afaganis
508e7912f9
Revert "[CK-TILE] Default epilogue, adding support for D ( #2629 )" ( #2746 )
...
This reverts commit d43228fbca .
2025-08-26 09:48:49 -07:00
Mateusz Ozga
d43228fbca
[CK-TILE] Default epilogue, adding support for D ( #2629 )
...
* Extend 2d-epilogue, D support
* Added tests & update
* Remove unused attribute
* Extend tests
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-08-25 19:29:35 -07:00
Yi DING
de61e55493
[CK_TILE] FMHA avoid unnecessary vmcnt0 ( #2715 )
...
* FMHA avoid unnecessary vmcnt0
Squashed commit of the following:
commit 7bdf6a7eef
Author: aska-0096 <haocwang@amd.com >
Date: Fri Aug 22 03:15:51 2025 +0000
merge develop and solve conflicts
commit f21e916a8c
Merge: a7dd2a7d1 0db21053e
Author: aska-0096 <haocwang@amd.com >
Date: Fri Aug 22 03:15:21 2025 +0000
Merge branch 'develop' of https://github.com/ROCm/composable_kernel into vmcnt0issue
commit a7dd2a7d13
Author: Ding, Yi <yi.ding@amd.com >
Date: Tue Aug 19 02:17:43 2025 +0000
update bwd
commit 380aa8f311
Author: Kevin Choi <kevin.choi@amd.com >
Date: Mon Aug 18 19:36:38 2025 +0000
add restrict to applicable functions
commit b85daba2a3
Author: Ding, Yi <yi.ding@amd.com >
Date: Mon Aug 18 02:07:03 2025 +0000
bwd filter
commit 75c4b9372f
Author: Kevin Choi <kevin.choi@amd.com >
Date: Sat Aug 16 08:15:23 2025 +0000
remove noinline attr as it causes a lot more s_waitcnt's
commit 598e3fec41
Author: Kevin Choi <kevin.choi@amd.com >
Date: Thu Aug 14 12:11:17 2025 +0000
remove innerloop, move restrict parameters to mainloop and add noinline attribute.
commit 3340408537
Author: Kevin Choi <kevin.choi@amd.com >
Date: Thu Aug 14 07:06:51 2025 +0000
Create inner lambda with restrict parameters, add restrict to some parameters
commit 3bc45ecbc7
Author: aska-0096 <haocwang@amd.com >
Date: Thu Aug 14 03:43:54 2025 +0000
save for debug
commit de4db6c4c5
Merge: 108abf00e 68694cb78
Author: aska-0096 <haocwang@amd.com >
Date: Wed Aug 13 02:15:22 2025 +0000
Merge branch 'wip-async-tr-fa' of https://github.com/ROCm/composable_kernel into wip-async-tr-fa
commit 108abf00e0
Merge: 0810799e2 0f42a92fc
Author: aska-0096 <haocwang@amd.com >
Date: Wed Aug 13 02:14:26 2025 +0000
Merge branch 'develop' of https://github.com/ROCm/composable_kernel into wip-async-tr-fa
commit 68694cb781
Merge: 0810799e2 20288caa2
Author: asleepzzz <hanwen.chang@amd.com >
Date: Wed Aug 13 00:34:11 2025 +0800
Merge branch 'develop' into wip-async-tr-fa
commit 0810799e25
Author: aska-0096 <haocwang@amd.com >
Date: Tue Aug 12 14:25:50 2025 +0000
refactor blockgemm change, isolate to v2;
commit fd1eb323af
Author: aska-0096 <haocwang@amd.com >
Date: Tue Aug 12 09:26:13 2025 +0000
clang format
commit 75f6f6bac4
Merge: bcc05eee6 8e1eb0c1e
Author: aska-0096 <haocwang@amd.com >
Date: Tue Aug 12 09:04:41 2025 +0000
Merge branch 'develop' of https://github.com/ROCm/composable_kernel into wip-async-tr-fa
commit bcc05eee62
Author: aska-0096 <haocwang@amd.com >
Date: Tue Aug 12 08:46:06 2025 +0000
Fix the bug
commit 96d24497f5
Author: aska-0096 <haocwang@amd.com >
Date: Tue Aug 12 04:02:41 2025 +0000
fix conflict. disable all v-col instance for fmha fwd
commit 1716171be4
Merge: 1c9800790 4fde1646e
Author: aska-0096 <haocwang@amd.com >
Date: Tue Aug 12 03:52:34 2025 +0000
Merge branch 'develop' of https://github.com/ROCm/composable_kernel into wip-async-tr-fa
commit 1c98007901
Author: aska-0096 <haocwang@amd.com >
Date: Tue Aug 12 01:53:31 2025 +0000
clang format
commit f43e903b1d
Merge: 3868ddd70 a7badc6ec
Author: aska-0096 <haocwang@amd.com >
Date: Tue Aug 12 01:52:52 2025 +0000
Merge branch 'develop' of https://github.com/ROCm/composable_kernel into wip-async-tr-fa
commit 3868ddd708
Merge: 498d234ab 191c62967
Author: aska-0096 <haocwang@amd.com >
Date: Mon Aug 11 15:59:40 2025 +0000
Merge branch 'develop' of https://github.com/ROCm/composable_kernel into wip-async-tr-fa
commit 498d234ab8
Author: aska-0096 <haocwang@amd.com >
Date: Mon Aug 11 15:37:37 2025 +0000
change the warp setting for hdim32 fmha fwd
commit b86f7786e2
Author: aska-0096 <haocwang@amd.com >
Date: Mon Aug 11 14:21:09 2025 +0000
tempsave, update the blocksync functions
commit 7b8052d7ca
Author: aska-0096 <haocwang@amd.com >
Date: Sun Aug 10 06:00:51 2025 +0000
fix bug in pki4
commit 76cbbb84a2
Author: aska-0096 <haocwang@amd.com >
Date: Sat Aug 9 03:25:12 2025 +0000
fix bugs in gemm
commit 8c101ccb88
Author: aska-0096 <haocwang@amd.com >
Date: Fri Aug 8 18:35:53 2025 +0000
fix bug on non-gfx950
commit efb8549279
Author: aska-0096 <haocwang@amd.com >
Date: Fri Aug 8 17:53:19 2025 +0000
fix bug
commit 729e8785fb
Author: aska-0096 <haocwang@amd.com >
Date: Fri Aug 8 15:42:15 2025 +0000
fix bugs
commit 250dc13c75
Author: aska-0096 <haocwang@amd.com >
Date: Fri Aug 8 09:31:01 2025 +0000
fix clangformat with 18.1.3
commit 106edeecd9
Author: aska-0096 <haocwang@amd.com >
Date: Fri Aug 8 09:07:40 2025 +0000
remove non-necessary change
commit 78edd7303b
Author: aska-0096 <haocwang@amd.com >
Date: Fri Aug 8 09:04:02 2025 +0000
bug fix, clang format;
commit 3b9fb6af38
Author: aska-0096 <haocwang@amd.com >
Date: Fri Aug 8 08:08:03 2025 +0000
Remove unnecessary changes
commit 6bb57c2c57
Merge: 1ecee378d ab2602683
Author: aska-0096 <haocwang@amd.com >
Date: Fri Aug 8 07:50:12 2025 +0000
Merge branch 'develop' of https://github.com/ROCm/composable_kernel into wip-async-tr-fa
commit 1ecee378d5
Author: aska-0096 <haocwang@amd.com >
Date: Fri Aug 8 06:19:31 2025 +0000
remove unnecessary files; rename some files
commit b4640a9de6
Author: aska-0096 <haocwang@amd.com >
Date: Fri Aug 8 05:46:18 2025 +0000
merge fa_decode pipeline into fmha_fwd api
commit fe63a646a4
Author: aska-0096 <haocwang@amd.com >
Date: Wed Aug 6 05:58:43 2025 +0000
add __restrict__ to tr load
commit 414cad667b
Author: aska-0096 <haocwang@amd.com >
Date: Tue Aug 5 07:23:51 2025 +0000
Add XOR fold strategy for hdim<128, but perf dropped; disable it by default; wait further perf debug
commit 0d12fc944f
Author: aska-0096 <haocwang@amd.com >
Date: Mon Aug 4 10:27:42 2025 +0000
Add v_permlaneb32 for block_reduce. Disable it as it will cause un-coexecutable packed math in FA
commit 4f31847de1
Author: aska-0096 <haocwang@amd.com >
Date: Mon Aug 4 10:02:17 2025 +0000
add vmcnt guard before load ktile
commit 746f4ccb99
Author: aska-0096 <haocwang@amd.com >
Date: Mon Aug 4 06:49:01 2025 +0000
Load Q through lds, implement xor;
commit 2d4e73d2b4
Author: aska-0096 <haocwang@amd.com >
Date: Fri Aug 1 10:44:54 2025 +0000
small refactor
commit a28b6e67fe
Author: aska-0096 <haocwang@amd.com >
Date: Thu Jul 31 10:25:37 2025 +0000
upgrade prefill pipeline; simple iglp; consistent data produce and consume order
commit 75cba48682
Author: aska-0096 <haocwang@amd.com >
Date: Thu Jul 31 05:13:27 2025 +0000
enable larger tile size; upgrade xor pattern
commit 69890afc98
Author: aska-0096 <haocwang@amd.com >
Date: Wed Jul 30 12:25:33 2025 +0000
remove all lds bankconflict with xor layouts
commit 8dacc35c4c
Author: aska-0096 <haocwang@amd.com >
Date: Wed Jul 30 03:51:06 2025 +0000
enable prefill overload operator().
commit 13bcc913de
Author: aska-0096 <haocwang@amd.com >
Date: Fri Jul 25 07:10:01 2025 +0000
fix the lds alignment caused performance regression
commit af28123cec
Author: aska-0096 <haocwang@amd.com >
Date: Wed Jul 23 09:05:57 2025 +0000
remove unnecessary features
commit 14e0ab70c6
Author: aska-0096 <haocwang@amd.com >
Date: Tue Jul 22 08:04:05 2025 +0000
tempsave. asynccopy+trload sanity checked
commit 1b468bac0b
Author: aska-0096 <haocwang@amd.com >
Date: Mon Jul 21 05:55:55 2025 +0000
tempsave, trload+asyncload done
commit afd96d8180
Author: aska-0096 <haocwang@amd.com >
Date: Fri Jul 18 10:04:34 2025 +0000
compile pass
commit 5616551115
Merge: ae39c84f5 095393276
Author: aska-0096 <haocwang@amd.com >
Date: Fri Jul 18 05:17:27 2025 +0000
Merge branch 'develop' of https://github.com/ROCm/composable_kernel into wip-async-tr-fa
commit ae39c84f55
Author: aska-0096 <haocwang@amd.com >
Date: Fri Jul 18 05:16:39 2025 +0000
tempsave
commit 94b6430489
Author: aska-0096 <haocwang@amd.com >
Date: Thu Jul 17 10:06:09 2025 +0000
temp save
commit 7e330553dc
Merge: 18669925c 804f77dce
Author: aska-0096 <haocwang@amd.com >
Date: Thu Jul 17 07:24:32 2025 +0000
Merge branch 'test_copy_fix' of https://github.com/ROCm/composable_kernel into fa_decode_pipeline
commit 804f77dce5
Author: aska-0096 <haocwang@amd.com >
Date: Thu Jul 17 03:10:46 2025 +0000
move test_copy into test
commit 21627d7ca7
Author: aska-0096 <haocwang@amd.com >
Date: Thu Jul 17 02:41:31 2025 +0000
remove unnecessary output
commit 287792c44a
Merge: a4221db30 21fd7e953
Author: aska-0096 <haocwang@amd.com >
Date: Thu Jul 17 02:26:13 2025 +0000
Merge branch 'test_copy_fix' of https://github.com/ROCm/composable_kernel into test_copy_fix
commit a4221db304
Author: aska-0096 <haocwang@amd.com >
Date: Thu Jul 17 02:26:10 2025 +0000
add input validation and bug fix
commit 21fd7e9538
Merge: d6df7bf85 6e76b8205
Author: Max Podkorytov <4273004+tenpercent@users.noreply.github.com >
Date: Wed Jul 16 11:23:57 2025 -0700
Merge branch 'develop' into test_copy_fix
commit d6df7bf851
Author: aska-0096 <haocwang@amd.com >
Date: Wed Jul 16 08:55:50 2025 +0000
fix vmcnt shift
commit 40e039e4e4
Author: aska-0096 <haocwang@amd.com >
Date: Wed Jul 16 08:37:07 2025 +0000
Improve s_waitcnt_imm calculation
commit c30f8b709b
Author: aska-0096 <haocwang@amd.com >
Date: Wed Jul 16 05:39:50 2025 +0000
fix the s_waitcnt_imm calculation
commit ec0a45b29f
Merge: e5cc4af80 6b09f0823
Author: aska-0096 <haocwang@amd.com >
Date: Wed Jul 16 03:57:57 2025 +0000
Merge branch 'develop' of https://github.com/ROCm/composable_kernel into test_copy_fix
commit e5cc4af808
Author: aska-0096 <haocwang@amd.com >
Date: Wed Jul 16 03:54:33 2025 +0000
Add block_sync_lds_direct_load utility
commit eea58629cf
Author: aska-0096 <haocwang@amd.com >
Date: Tue Jul 15 09:39:03 2025 +0000
fix async copytest bug
commit 18669925cc
Author: aska-0096 <haocwang@amd.com >
Date: Thu Jul 10 04:29:33 2025 +0000
temp save, change all instance to 1wave
commit 18686cfe5b
Author: aska-0096 <haocwang@amd.com >
Date: Tue Jul 8 08:37:20 2025 +0000
tempsave, fmha_decode
commit 47565f21a5
Author: aska-0096 <haocwang@amd.com >
Date: Sat Jun 21 15:02:57 2025 +0000
temp save, waiting for debug
commit e0a634ef97
Author: aska-0096 <haocwang@amd.com >
Date: Thu Jun 19 05:11:52 2025 +0000
save an example for __bf16 type
commit 4bd5fd4a3c
Author: aska-0096 <haocwang@amd.com >
Date: Wed Jun 18 07:27:24 2025 +0000
fix bwd code
commit 69809d9513
Author: aska-0096 <haocwang@amd.com >
Date: Wed Jun 18 06:37:16 2025 +0000
Fix for fwd/bwd kernel build filter
commit d5ec3d0e5768aafed7f77151b2a835e87b9f95ba
Author: Ding, Yi <yi.ding@amd.com >
Date: Tue Aug 19 08:13:18 2025 +0000
Add restrict to avoid unnecessary vmcnt
---------
Co-authored-by: aska-0096 <haocwang@amd.com >
* Add comments for c-stype cast
* Better comments
---------
Co-authored-by: aska-0096 <haocwang@amd.com >
2025-08-25 20:55:12 +08:00
John Shumway
c71d7ddd74
Remove unsupported use of c++20 concept. ( #2719 )
...
Downstream libraries aren't migrated to c++20 yet, so replace a use of c++20 concept with equivalent SFINAE logic. The template checks for both the existence and the truthiness of the static member variable.
2025-08-24 21:29:23 -07:00
Po Yen Chen
0db21053e6
[CK_TILE] Allow switching between SGPR/VGPR get_warp_id() return values ( #2669 )
...
* Allow return VGPR get_warp_id() value
* Avoid using SALU in async_load_raw()
2025-08-22 10:17:05 +08:00
Po Yen Chen
4a7ecce096
[CK_TILE][FMHA] Enable dwordx4 loading in async_load_tile_raw() ( #2549 )
...
* Support async load dwordx4
* Enlarge load size on gfx950
2025-08-22 10:13:47 +08:00
Yi DING
4cfa2c7158
[CK_TILE] FMHA BWD Fix Compilation with Bias ( #2682 )
...
* [CK_TILE] FMHA BWD Fix Compilation with Bias
* Fix appendkv kApplyRoPE
2025-08-22 10:01:10 +08:00
Bartłomiej Kocot
4212bbc170
[CK Tile] Grouped convolution backward data ( #2652 )
...
* base working version for single groupped conv bwd data
* Fix 2d descriptor
* fix groups
* Add 3d support
* fixes
* fixes
* fixes
---------
Co-authored-by: Jakub Piasecki <jakpia21@gmail.com >
2025-08-20 05:29:57 -07:00