Illia Silin
7eaa398458
Fix direct lds load for gfx950 and clang20 ( #2346 )
...
* fix direct lds load for gfx950 and clang20
* Update include/ck/utility/amd_buffer_addressing_builtins.hpp
* Fix format
---------
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
Co-authored-by: Andriy Roshchenko <andriy.roshchenko@amd.com >
[ROCm/composable_kernel commit: 2d8a804152 ]
2025-06-15 15:22:34 -07:00
Andriy Roshchenko
72054549e7
Optimized GEMMs for MX FP4/8 ( #2294 )
...
Adds V3 GEMM pipeline for MX FP4 and MX FP8
Adds V3 GEMM pipeline for MX FP4 with preshuffling
Adds MXFP4 GEMM tests (#2275 )
Adds MXFP4 GEMM examples
Adds MXFP4 GEMMs to ckProfiler
Co-authored-by: Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com >
Co-authored-by: Andriy Roshchenko <andriy.roshchenko@amd.com >
Co-authored-by: aska-0096 <haocwang@amd.com >
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: OscarXu <huaiguxu@amd.com >
Co-authored-by: mtgu0705 <mtgu@amd.com >
Co-authored-by: Ding, Yi <yi.ding@amd.com >
Co-authored-by: feifei14119 <feiw@amd.com >
Co-authored-by: Lin, Qun <qlin@amd.com >
Co-authored-by: joye <joye@amd.com >
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com >
[ROCm/composable_kernel commit: 00247e3c29 ]
2025-06-05 13:54:15 -06:00
Illia Silin
24057b3662
fix the buffer intrinsic names for clang >=20 ( #2228 )
...
[ROCm/composable_kernel commit: 8146e471f1 ]
2025-05-23 14:58:25 -07:00
Illia Silin
9c7b0a65f9
Revert "Update the buffer load/store intrinsic names for clang>=20. ( #2192 )" ( #2227 )
...
This reverts commit 7d92e48278 .
[ROCm/composable_kernel commit: 1b846143c6 ]
2025-05-22 15:41:17 -07:00
Illia Silin
7d92e48278
Update the buffer load/store intrinsic names for clang>=20. ( #2192 )
...
* fix the buffer load/store intrinsic names
* fix clang format
[ROCm/composable_kernel commit: 58f9e9ffbc ]
2025-05-13 10:18:14 -07:00
Illia Silin
9d24409070
Replace buffer load/store intrinsics with builtins ( #1876 )
...
* replace buffer load/store intrinsics with builtins
* fix clang format
* replace buffer load/store intrinsics with built-ins in ck_tile
* fix clang format
* add switch between buffer intrinsics and built-ins
* change the builtins threshold to clang20
* fix clang format
* fix some compilation errors
* revert changes in ck_tile
* revert changes in ck_tile
* delete all root files and folders when CI completes
* try changing the username in CI
* fix groovy syntax
* add user and group id info to ci dockers
* change ownership of all files in CI to jenkins at the end
* update changelog
[ROCm/composable_kernel commit: a88bf76ecc ]
2025-03-05 14:33:28 -08:00