mtgu0705
40ed20a30d
updated the codes
2025-06-04 02:04:28 -05:00
mtgu0705
5117e99822
Merge remote-tracking branch 'origin/mx_moe_f4_scaleshuffle_Bnoshuffle' into mxfp4_moe_blockscale_buf2lds
2025-06-04 00:14:18 -05:00
Ding, Yi
dd8284dd63
Fix unknown compiler flag
2025-06-04 03:54:39 +00:00
mtgu0705
620a4d2a0e
added gather offset xor, function passed.
2025-06-03 10:13:03 -05:00
aska-0096
86c8bef5d7
Refactor thread_copy_lds_direct_load; fix gfx942 direct lds load example; fix f16_pki4 example
2025-06-03 14:54:30 +00:00
Ding, Yi
407489d2c0
Fix ThreadwiseTensorSliceTransfer_v4::Run (Fuse scale)
2025-06-03 09:29:50 +00:00
aska-0096
8ecc3812de
doc the kGroup definition
2025-06-03 08:53:04 +00:00
aska-0096
3491918bfb
fix moe pki4 on gfx950
2025-06-03 07:40:47 +00:00
Ding, Yi
0cfb09c6c8
Fix target_compile_options for disabled target on gfx942
2025-06-03 06:52:52 +00:00
Ding, Yi
1c2da4b2bf
Fix warning
2025-06-03 06:48:49 +00:00
Ding, Yi
0cbc5e2bdb
Use packed_size_v for A/BPackedSize
2025-06-03 06:10:07 +00:00
Ding, Yi
331ccb8ca2
Merge remote-tracking branch 'origin/develop' into gfx950-mxfp4
2025-06-03 05:38:56 +00:00
mtgu0705
edffec8fc4
change the code to suit threadwise loading are continued
2025-06-03 00:36:07 -05:00
Ding, Yi
f9bf27548e
Generate random tensor values with multiple threads
2025-06-03 02:40:24 +00:00
Khushbu Agarwal
2e38eb4f1c
Rotating buffer PR CI fix ( #2257 )
...
* Revert "Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200 )" (#2256 )"
This reverts commit bbdaf79a52 .
* fix regression
2025-06-02 10:25:01 -07:00
dependabot[bot]
cffe8fa2a4
Bump rocm-docs-core[api_reference] from 1.19.1 to 1.20.0 in /docs/sphinx ( #2272 )
...
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core ) from 1.19.1 to 1.20.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.19.1...v1.20.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
dependency-version: 1.20.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-02 06:44:10 -07:00
aska-0096
dd24786f78
Enable splitk for mxfp4; clang format;
2025-06-02 12:23:01 +00:00
valarLip
0fdbf6bcd1
extend buffer load for fp16/bf16x16 ( #2270 )
...
* extend buffer load for fp16/bf16x16
* format
2025-06-02 10:29:54 +08:00
Kiefer van Teutem
2215a9edf0
Explicitly set the LINKER_LANGUAGE for the gemm_template_instances target to avoid Ninja build config failure. ( #2265 )
...
Co-authored-by: kiefer <kiefer.van.teutem@streamhpc.com >
2025-05-30 13:32:28 -07:00
Illia Silin
654956bb02
Add a daily CI build on GFX950. ( #2261 )
...
* add CI build for gfx950
* make sure gfx950 CI always uses special docker and compiler
* enable codegen tests by default
2025-05-30 12:50:08 -07:00
aska-0096
5696e3c9f5
fix the cmake issue
2025-05-30 15:51:58 +00:00
Mirza Halilčević
fbce6c7bb6
Define CHAR_BIT during hipRTC ( #2264 )
...
* Fix failing codegen tests.
* fix clang format
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
2025-05-30 08:23:44 -07:00
dependabot[bot]
61e6c382c6
Bump rocm-docs-core[api_reference] from 1.19.0 to 1.19.1 in /docs/sphinx ( #2263 )
...
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core ) from 1.19.0 to 1.19.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.19.0...v1.19.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
dependency-version: 1.19.1
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-30 05:56:59 -07:00
aska-0096
bb5bdff61c
remove unnecessary files
2025-05-30 08:39:25 +00:00
Ding, Yi
0cd2e6e782
Fix OOB; add MB96 instances
2025-05-30 07:46:28 +00:00
Ding, Yi
6cba96e510
Use v1 pipeline for example_moe_gemm2_xdl_mx_fp4_bns
2025-05-30 05:46:31 +00:00
mtgu0705
aeb717a132
add pipeline v1 for MOE Gemm2
2025-05-30 05:25:43 +00:00
OscarXu
798345a1cf
Fix moe blockscale gemm1 barrier 0x800 for new compiler
2025-05-30 04:13:42 +00:00
Ding, Yi
50956c6c7b
Merge remote-tracking branch 'origin/wjx/moe_v3_aiter' into gfx950-mxfp4
2025-05-30 03:56:35 +00:00
Ding, Yi
69418725a6
Merge remote-tracking branch 'origin/moe_bs_fp8_no_asm' into gfx950-mxfp4
2025-05-30 03:15:47 +00:00
slippedJim
57f497452a
remove restriction of group mode hd192 no lse ( #2252 )
...
Co-authored-by: Jim <jimguo12@amd.com >
2025-05-30 10:14:21 +08:00
Illia Silin
4e561af18c
Revert "add CShuffleM/NXdlPerWavePerShuffle in cshuffle_epilogue ( #2185 )" ( #2260 )
...
This reverts commit fd6a859b44 .
2025-05-29 16:22:16 -07:00
Paul Fultz II
306f4c537e
Export codegen targets ( #2259 )
2025-05-29 11:03:51 -07:00
joyeamd
fd6a859b44
add CShuffleM/NXdlPerWavePerShuffle in cshuffle_epilogue ( #2185 )
...
* add cshuffle's mxdlperwavepershuffle support, not finished
* add epilogue functions
* add cshuffle's mxdlperwavepershuffle support, not finished
* add epilogue functions
* update cshuffle logic
* update cshuffle_logics
* add some change within review
* update some codes following the code review
* update epilogue logic
* remove from problem
* update codes following review.
* fix some issues
2025-05-29 14:31:14 +02:00
aska-0096
3c24d690a1
remove single rate mfma restriction for f8 blockscale gemm
2025-05-29 10:13:50 +00:00
aska-0096
33085c8458
clang format, remove single rate mfma restriction for f8
2025-05-29 10:10:12 +00:00
aska-0096
d563dac424
fix performance bug of bpreshuffle f8 gemm
2025-05-29 10:02:46 +00:00
Po Yen Chen
28cd0dffc9
[CK_TILE] FMHA forward batch_prefill optimization for low CU utilization ( #2251 )
...
* Add constraint on traits/tile/pipeline
* Use kM0=128 if max_seqlen_q == 8192
* Re-format codegen script
* Remove redundant attr name postix
* Fix import error: default field in dataclass
* Use kK0=64 & kK1=64 to hide latency
* Use CU utilization to decide tile size
2025-05-29 18:36:33 +09:00
valarLip
ccddc5215e
recover example
2025-05-29 09:09:40 +00:00
aska-0096
c3d52993c4
update the flag name for f8blockscale
2025-05-29 08:47:34 +00:00
OscarXu
6be76c53b6
No asm ver. for merging moe blocksale fp8 into mainline
2025-05-29 03:38:56 -05:00
aska-0096
ced28892b6
Merge branch 'gfx950-mxfp4' of https://github.com/ROCm/composable_kernel into gfx950-mxfp4
2025-05-29 08:21:58 +00:00
aska-0096
0db8d71dc1
Remove debug infos; Enable flags for blockscale f8
2025-05-29 08:21:54 +00:00
Ding, Yi
e4a40c7214
Add fp8 profiler instances
2025-05-29 08:19:31 +00:00
OscarXu
52d68c9529
flag and barrier fix for copmiler branch MainOpSelV3
2025-05-29 03:13:11 -05:00
Ding, Yi
f9ccd1a378
Fix bf8 config
2025-05-29 02:20:47 +00:00
Ding, Yi
2b4b189a5f
Fix fp8 config
2025-05-29 02:18:02 +00:00
OscarXu
653bc83f8a
Remove rocm6.3 workaround flags and macro
2025-05-28 21:05:21 -05:00
Bartłomiej Kocot
e7906dd644
Change relu to clamp for grouped conv fwd instances ( #2249 )
2025-05-29 00:51:25 +02:00
Adam Dickin
6df1c56ad6
Changes to allow MIOpen to build CK as part of its build. ( #2247 )
...
* tweaks to the miopen specific build. add way to skip clang-tidy checks and a way to skip some custom build targets MIOpen also has.
* move the tidy if statment
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2025-05-28 13:51:15 -07:00