composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-14 18:17:44 +00:00

Author	SHA1	Message	Date
Khushbu Agarwal	e34599e8a9	Merge flatmm Operator with universal gemm (#2434 ) * Initial commit * Adding new tile partitioner to flatmm * intermediate changes * debugging kernels * Updating flatmm example to universal gemm example * updated flatmm kernel to run via gemmKernel * update universal gemm to incorporate flatmm * debug * Fix flatmm call * Fixing other kernels and tests for API changes * clang formatted * fixing gemm tests * added test for flatmm and simplify kernel arguments * adding flatmm test * fix test for flatmm * simplify gemm kernel with flatmm * remove flatmm related files * addressing review comments and code clean up * resolving empty file * resolving empty file * clang formatted * addressing review comments * enable persistent kernel for flatmm * reverted the removed files for flatmm * reverted the removed files for flatmm * changed flatmm to weightPReshuffle; removed the _1 added in teh faltmm example * some more renames * clang formatted [ROCm/composable_kernel commit: `d239b91fd5`]	2025-07-11 08:27:55 -07:00
Qianfeng	fb42be79dc	Add separate mask checking for scope [aligned_physical_seqlen_k_start, physical_seqlen_k_end) (#2487 ) * Add separate mask checking for scope [aligned_physical_seqlen_k_start, physical_seqlen_k_end) in pagedkv pipeline * i_nhead_ conversion type to prevent overflow --------- Co-authored-by: ltqin <letaoqin@amd.com> [ROCm/composable_kernel commit: `45904b8fd7`]	2025-07-11 18:14:47 +08:00
shay-li77	0a1eb8381d	support y-direction step length greater than 1 for SimplifiedGenericAttentionMask (#2338 ) * mask support ratio for y axis * format code * add notes for param y_ratio * fix comments error * support template and mdiv for ratio mask * refactor y-ratio mask constructor * optimize coordinate calculation * add SimplifiedRatioAttentionMask [ROCm/composable_kernel commit: `d814fefe18`]	2025-07-09 23:18:55 +08:00
Yi DING	7a9add1417	[CK_TILE] Avoid compile kernel in host pass (#2475 ) [ROCm/composable_kernel commit: `032ca60015`]	2025-07-09 22:27:54 +08:00
Haocong WANG	4b6049f553	[CK TILE] Fix FA build filter (#2369 ) * Fix for fwd/bwd kernel build filter * fix bwd code * cmake depends & bwd filter order fix * revert unexpected reformat * Avoid change fmha bwd filter order for downstream compatibility * Revert unexpected changes --------- Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> Co-authored-by: Ding, Yi <yi.ding@amd.com> [ROCm/composable_kernel commit: `5557eadce6`]	2025-07-08 10:42:07 +08:00
Illia Silin	305be94f42	fix compilation errors with clang20 (#2464 ) [ROCm/composable_kernel commit: `e033a1b4bf`]	2025-07-07 19:40:30 -07:00
Po Yen Chen	e35e3dee86	Eliminate warning caused by failed to meet occupancy requirement (#2389 ) Co-authored-by: felix <felix.li@amd.com> [ROCm/composable_kernel commit: `b2dea90116`]	2025-07-08 09:17:25 +08:00
Thomas Ning	a01042c3cf	Enable Async Copy for MI355 (#2425 ) * add for async load builtin * add async load api * fix some compiling errors * fix a compiling error * fix some compiling errors * add a pipeline which copies from v4 * add a new pipeline for async load * fix some compiling errors * add async load tests * fix some issues in async load * fix * fix async inline assembly * fix async inline assembly * add ignore header file * comment some not gfx950 codes * comment some not gfx950 codes * fix a error * update async load apis * fix lds descriptor * fix a compiling error * fix some compiling errors * fix a descriptor issue * update lds descriptor * change async pipeline's tile distribution pattern from thread to warp * fix clang format * update async policy * fix a CRTP issue * fix a typo error * change lds layout * fix some sync issues * improve codes * delete the async test * fix a commented format issue * avoid compiling device functions when compile host * make gemm run * add the copy kernel support * finish the feature * Address comment * add the support for buffer_builtin * solved the merging problem * Comment Addressed --------- Co-authored-by: joye <joye@amd.com> Co-authored-by: joyeamd <John.Ye@amd.com> [ROCm/composable_kernel commit: `f240ae3248`]	2025-07-07 10:08:49 -07:00
ltqin	273c031eca	ck tile pagedkv prefill (#2405 ) * add prefetching physical block id for pagedkv * start add pagedkv prefill * rename pipeline * add kernel for pagedkv * add an init version pagedkv prefill * fix redefine issue * add struct BlockFmhaFwdPagedKVPipelineProblem and fmha_fwd_pagedkv_args * generate dispatch code * add body generating code * comipling pass * remove dropout from pagedkv * set lse to false in generating code * start changing qr kernel to pagedkv * init version of kernerl with pagedkv * change names of file that are generated * chang host validation for pagedkv prefill * using iglp to change blockgemm * add kernel files to op head file * show parameters * rewrite print parameter fun * add fwd * remove default parameter of GridSize * format * fix nhead issue and add seqlen_k_ptr to batch mode * format code * remove no-longer used code * format * fix some comments --------- Co-authored-by: ltqin <letaoqin@amd.com> Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `9f4c5d7372`]	2025-07-07 16:16:54 +08:00
carlushuang	1267b56f36	default skip y point to r (#2457 ) Co-authored-by: Thomas Ning <Thomas.Ning@amd.com> [ROCm/composable_kernel commit: `0aecb5ab68`]	2025-07-06 23:54:34 -07:00
carlushuang	6627989a02	[CK_TILE][CORE] enhance slice_tile api (#2430 ) * support slice cross p * fix some bug in y_len * more case * fix a bug when R exist * support -1 to hint end of current length * format * change commit [ROCm/composable_kernel commit: `a8742f7e31`]	2025-07-06 20:13:12 -07:00
Max Podkorytov	43f6087b13	[CK-TILE] File-level documentation for static encoding pattern (#2433 ) * add file-level comment * Finished the write-up --------- Co-authored-by: ThomasNing <thomas.ning@amd.com> [ROCm/composable_kernel commit: `158ddeb8ce`]	2025-07-04 02:26:18 -07:00
Thomas Ning	753232ea70	[CK Tile] Int8 Support on CK Tile GEMM (#2267 ) * updates to support int8 in 03_gemm example * added comments, using aliases, helper functions * test(gemm_universal): add test cases for int8 gemm pipeline * fix(test_gemm): fix for failing test unit test for int8 * test(ck_tile): add int8 unit test for gemm universal * refactor(gemm_universal): GPU reference verification for GEMM code improved * style(gemm_universal): removed extra comments and did clang format * merging recent changes to universal gemm to tile_engine * ck tile engine integration work * feat(tile_engine): add int8 support to tile engine ops/gemm * feat(tile_engine): added 32 32 16 mfma instances to tile engine for int8 * style: Format code with clang-format-12 * refactor(tile_engine): address review comments * style: removed unhelpful comments & unused variables. * build: tile engine uses default config * feat: add int8 support for CK_TILE GEMM * style: added trailing commas to codegen_utils.py * refactor: tile engine * refactor: formatting and code review * refactor: code formatting for python files * fix: suppress build warning * add support for gfx950 * refactor:KWarpTile size in gemms util * Fix the branch and wrap up the k warp tile * Add bf8 integration * refactor: clang format and rebase --------- Co-authored-by: zjli2013 <leezhengjiang@gmail.com> Co-authored-by: AviralGoelAMD <aviral.goel@amd.com> Co-authored-by: Khushbu Agarwal <khuagarw@amd.com> [ROCm/composable_kernel commit: `e03293ebce`]	2025-06-25 08:20:35 -07:00
linqunAMD	511f170dab	[CK_TILE] Refine fp8 support in flatmm (#2239 ) * [CK_TILE] Refine fp8 in flatmm 1. Replace USING_MFMA_16x16x32 & USING_MFMA_16x16x32 with constexpr 2. Add an additional const check to avoid build error in HotLoopScheduler 3. Refine shuffleb to support both tile 32x32 and 16x16 4. Support command option -init 5. Move Gemm warp defintion to a separate struct * fix clang format * fix clang format * keep default bhavior unchanged (warp tile = 16x16) * fix tile engine build error * fix a typo in codegen_utils.py * address review comments * address review comments --------- Co-authored-by: Thomas Ning <Thomas.Ning@amd.com> [ROCm/composable_kernel commit: `37e1a27537`]	2025-06-25 01:07:45 -07:00
Po Yen Chen	b86c92c84e	[CK_TILE] Add missing parameter 'min_seqlen_q' to the FMHA fwd kernel MakeKargs() interface (#2403 ) * Rename batch_prerfill interface * Add min_seqlen_q parameter in MakeKargs() [ROCm/composable_kernel commit: `50fad03524`]	2025-06-25 15:19:21 +08:00
Yi DING	820ba182a0	Fix unmatched K size of WarpGemmMfmaBf16Bf16F32M16N16K32TransposedCDistribution on gfx950 (#2393 ) [ROCm/composable_kernel commit: `c5d9181e1b`]	2025-06-24 16:35:54 -07:00
Anton Gorenko	e156b5aebb	Improve fmha_bwd tests performance (#2376 ) * Avoid passing indices (std::vector) by value to host tensor's operator() Each access requires 2 allocations and copies of the vector. * Remove 1 unneeded vector copy from the slowest part of fmha_bwd's verification * Compute ds_hp_host_ref in parallel This sequntial ForEach is the slowest part of validation and it benefits from parallel computation. * Do not use ForEach for simple copy and conversion of large tensors These tensors all have the same shape {nhead, real_seqlen_q, real_seqlen_k} and can be copied/converted without complex computations of linear indices. [ROCm/composable_kernel commit: `77123600ee`]	2025-06-24 07:45:24 -07:00
Yi DING	9f0d3497c3	[CK_TILE] FMHA Support hdim_v to as a Multiple of 32 (#2114 ) * 160+192 * Add splitkv d160 * cleanup * fix * Add change log * Fix CHANGELOG * Use static_cast * Update ignored instance --------- Co-authored-by: asleepzzz <hanwen.chang@amd.com> [ROCm/composable_kernel commit: `b8212864cf`]	2025-06-24 01:33:31 +08:00
Po Yen Chen	7001322416	[CK_TILE] Fix compilation errors introduced in #2320 , #2219 and #2214 (#2388 ) * Fix compilation errors * Fix more ck_tile example compilation errors [ROCm/composable_kernel commit: `7d669440a6`]	2025-06-23 12:29:15 +08:00
Max Podkorytov	0bb4daa71b	Update for xformers (#2372 ) * update api * update kernel api * clang-format [ROCm/composable_kernel commit: `0366fb2abc`]	2025-06-22 00:28:30 -07:00
Bartłomiej Kocot	29cfe38b42	[CK TILE] Grouped Convolution Forward Kernel (#2188 ) * [CK TILE] Grouped Convolution Forward Kernel * custom vector size * fixes * refactor * rebase fixes * fixes * fixes [ROCm/composable_kernel commit: `cebdee4d9e`]	2025-06-20 15:44:36 -07:00
Thomas Ning	3414888f92	Transpose builtin macro defense (#2374 ) * add the macro defense * add the static assert check [ROCm/composable_kernel commit: `107e3623c7`]	2025-06-20 11:24:54 -07:00
Max Podkorytov	7c10189a27	Reland fix default epilogue (#2367 ) * Revert "Revert "Fix default epilogue (#2358)" (#2364)" This reverts commit `f85c70b31e`. * add operator() with old signature [ROCm/composable_kernel commit: `11eb9f1c77`]	2025-06-19 10:39:30 -07:00
joyeamd	3cb0dd8506	transpose load api development (#2177 ) * add transpose load; no real logic * fix some compile errors * fix some issues * update transpose load logic * add some fixes * fix a distribution issue * update some codes * add some fix * can pass; but no logic * transpose load enable * update tile transpose * miss output tile distribution mapping * hack for transpose 16x16 * update output tensor distribution * delete unused variables * fix transpose related codes * update transpose load example * exchange the iteration order * fix 16x16 related dimension transpose * fix a transpose index issue * fix a transpose index issue * fix clang format check * update load tile transpose related codes * fix compile errors and pass 16x16 tests * fix a typo * update logic * check other data types * add transpose load api * update transpose load api * fix clang format check * change file name * refactor codes * update code name * delete some unused codes * delete the unused oob flag for transpose load * update tensor view api for transpose load * update for testing * fix a typo error * move transpose ops to example directory * update transpose api * update include file * fix for pr review * fix compile errors * add transpose load; no real logic * fix some compile errors * fix some issues * update transpose load logic * add some fixes * fix a distribution issue * update some codes * add some fix * can pass; but no logic * transpose load enable * update tile transpose * miss output tile distribution mapping * hack for transpose 16x16 * update output tensor distribution * delete unused variables * fix transpose related codes * update transpose load example * exchange the iteration order * fix 16x16 related dimension transpose * fix a transpose index issue * fix a transpose index issue * fix clang format check * update load tile transpose related codes * fix compile errors and pass 16x16 tests * fix a typo * update logic * check other data types * add transpose load api * update transpose load api * fix clang format check * change file name * refactor codes * update code name * delete some unused codes * delete the unused oob flag for transpose load * update tensor view api for transpose load * update for testing * fix a typo error * move transpose ops to example directory * update transpose api * update include file * fix for pr review * fix compile errors * change directory name * delete the duplicated directory * update cmakelists file * delete the unused codes * update function names * update transpose policy * update code after remod.py * update codes * add some comment * Polish the instr infrastructure * build up the fixed instr * redesign the transpose api, currently it has numerical error * add the bf16 transpose * fix some issues * add some comments * update document * Finished the refactor of API and pass through the verification * fix the merging issue --------- Co-authored-by: ThomasNing <thomas.ning@amd.com> [ROCm/composable_kernel commit: `a2f01141aa`]	2025-06-18 01:28:34 -07:00
Thomas Ning	f85c70b31e	Revert "Fix default epilogue (#2358 )" (#2364 ) This reverts commit `b29e3830a6`. [ROCm/composable_kernel commit: `64a2fda713`]	2025-06-17 22:43:05 -07:00
carlushuang	f540c6ccb4	[CK_TILE] moe_sorting support "local_tokens" feature for EP case (#2335 ) * support local_token for hipgraph * update README * fix comment * fix fmoe example [ROCm/composable_kernel commit: `a4e1248dba`]	2025-06-18 10:49:43 +08:00
Max Podkorytov	b29e3830a6	Fix default epilogue (#2358 ) * [ck-tile] fix default epilogue in gemm universal * argument validation needs vector size D * operator() needs to specify dram windows * copy/paste from cshuffle epilogue * clang-format * mark unused argument --------- Co-authored-by: Thomas Ning <Thomas.Ning@amd.com> [ROCm/composable_kernel commit: `cd606f72c1`]	2025-06-17 17:30:21 -07:00
linqunAMD	af00674037	[CK_TILE] Support multi-config in tile_example_gemm_universal (#2240 ) * [CK_TILE] Support multi-config in tile_example_gemm_universal Add GemmConfig in run_gemm_example to support multiple tile config. - It is useful when use you need compare gemm perf with different tile/pipeline config - we also can use it simplify the code for wmma support in the furture. * [CK_TILE] Support multi-config in tile_example_gemm_universal Address review comments * rebase code and fix clang format. * fix clang format * support pipeline v5. * fix merge conflict * address review comment * add missing file * address review comment v2 * fix build error [ROCm/composable_kernel commit: `0eb8974502`]	2025-06-17 17:27:46 -07:00
Satyanvesh Dittakavi	bde406245a	Do not use warpSize as compile time constant as it is removed (#2320 ) * Do not use warpSize as compile time constant as it is removed * Update tile_image_to_column_shape.hpp update warpSize usage. * clean-up all use of warpSize, make sure code builds * fix --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com> Co-authored-by: Bartlomiej Kocot <barkocot@amd.com> [ROCm/composable_kernel commit: `4c57157d50`]	2025-06-17 11:54:30 -07:00
Thomas Ning	bc6af0fa49	Fix the CK Tile related operators (#2356 ) * fix the flatmm * Fix the pipeline * address the comment [ROCm/composable_kernel commit: `3c4cdfac4f`]	2025-06-16 17:38:52 -07:00
Illia Silin	0f4d68633b	Revert "fix the flatmm (#2349 )" (#2352 ) This reverts commit `fc65195605`. [ROCm/composable_kernel commit: `5523df4b2d`]	2025-06-16 07:54:55 -07:00
Thomas Ning	fc65195605	fix the flatmm (#2349 ) [ROCm/composable_kernel commit: `d996bc78be`]	2025-06-16 02:17:53 -07:00
ruanjm	1fdac8b8fe	Add support for specifying valid flag when fetching elements for tile_scatter_gather (#2332 ) * Add support for specifying valid flag when fetching elements for tile_scatter_gather Add constexpr for operator[] of TrueGenerator * Use different path when valid is enabled [ROCm/composable_kernel commit: `b34c234f51`]	2025-06-16 17:17:03 +08:00
carlushuang	370dd01230	hot fix block_gemm fail with pipeline_problem by adding NumWaveGroups inside block gemm problem (#2348 ) [ROCm/composable_kernel commit: `fb97f75099`]	2025-06-15 22:49:04 -07:00
Mateusz Ozga	6b3ddd0e23	[CK_TILE] Multiple-D GEMM example (#2219 ) * Multiple d, initial commit * Check Ds Layout * Readme and clang format * Update branch & conflicts * Multiple D - fix clang-formatter * Rename elemetwise_op * Fix CI * Code review part1 * Remove printf * Remove unnecessary comment * Add new tests with Col layout * Review part 2 * Added support for Multiple D GEMM * Update comment * Remove maybe_unused * Clang-format * Review part 3 * Add comment to function * Add comment to function: another * Take number of params for a refrence function * Remove additional d param for 0 tensor * Change name of function * Fix CI fails [ROCm/composable_kernel commit: `bd96ac9742`]	2025-06-13 19:39:11 +02:00
kylasa	afbc0625f4	Code drop for 2 warp ping pong scheduler along K dimension. (#2276 ) * Code drop for 2 warp ping pong scheduler along K dimension. * Addressing code review comments. * Addressing Clang formatting issues. * Addressing build issues. * Addressing build issues of other GEMM pipelines with ping pong scheduler code drop. * Fix for LDS memory size for GEMM pipelines. * Addressing code review feedback comments. * Change log update. * Addressing code review comments and build issues. * Added new policy for pipeline specific logic about LDS needs. * Clang Fix during build. [ROCm/composable_kernel commit: `5f1ad09b61`]	2025-06-12 18:24:02 -07:00
Thomas Ning	8b534e4037	OCP FP8 Macro restructure (#2331 ) * solved the problem [ROCm/composable_kernel commit: `f59b8c7d3d`]	2025-06-12 09:46:33 -07:00
carlushuang	a7eb83a51b	[CK_TILE] moe sorting optimization : refactor subtoken logic to let more kernel pickup mp kernel (#2327 ) * refactor subtoken logic to let more kernel pickup mp kernel * typo [ROCm/composable_kernel commit: `8aff45a8af`]	2025-06-12 11:44:22 +08:00
Thomas Ning	46624a1abd	Epilogue cshuffle Improvement (#2312 ) * add cshuffle's mxdlperwavepershuffle support, not finished * add epilogue functions * add cshuffle's mxdlperwavepershuffle support, not finished * add epilogue functions * update cshuffle logic * update cshuffle_logics * add some change within review * update some codes following the code review * update epilogue logic * remove from problem * update codes following review. * fix some issues * solve the previous PR error, refine the code * Update include/ck_tile/ops/epilogue/cshuffle_epilogue.hpp Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> * Comment addressed * handling tile_engine failing case * handling tile_engine failing case --------- Co-authored-by: joyeamd <John.Ye@amd.com> Co-authored-by: joye <joye@amd.com> Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by: khushbu agarwal <khuagarw@amd.com> [ROCm/composable_kernel commit: `06e0b8436c`]	2025-06-10 22:44:50 -07:00
Thomas Ning	a052bfa81a	fix on the typo (#2326 ) [ROCm/composable_kernel commit: `14d229d6c8`]	2025-06-10 16:34:33 -07:00
Khushbu Agarwal	7afee6c536	fix flatmm kernel for bigger size for fp16 datatype (#2302 ) [ROCm/composable_kernel commit: `bd270fe4bc`]	2025-06-10 11:13:40 -07:00
Eisuke Kawashima	87ad7ebf67	chore: unset executable permission (#2303 ) Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com> [ROCm/composable_kernel commit: `4e586ca958`]	2025-06-10 09:13:59 -07:00
John Afaganis	2fdbade459	Remove usage of 'warpSize' variable as it has been deprecated (#2295 ) * SWDEV-535598 - remove usage of 'warpSize' variable as it has been deprecated. Ideally get_warp_size() should not be constexpr but this is just a workaround * SWDEV-535598 - remove comment from get_warp_size as constexpr is required for this repo --------- Co-authored-by: Gerardo Hernandez <gerardo.hernandez@amd.com> [ROCm/composable_kernel commit: `6635d1bb88`]	2025-06-10 07:34:54 -07:00
carlushuang	e2c603e173	hot fix (#2315 ) [ROCm/composable_kernel commit: `2e0536269e`]	2025-06-10 20:35:28 +08:00
MHYangAMD	22250b2784	Fix fmha fwd precision issue on MI3XX series (#2285 ) * Fix fmha fwd precision issue on MI3XX series For fmha fwd fp16 cases, we found that using impl::cast_tile_pk_fp16_fp32 for casting P would lead to precision issues, since it uses __builtin_amdgcn_cvt_pkrtz, which is round to zero. For examaple, fixing K,V to be all 1, and Q is random, which outputs are expected to be all 1. But we found that it would have some incorrect outputs 0.9995, which are smaller than the atol 0.001. (1 - 0.9995 = 0.0005 < 0.001) Thus, ck do not report this error. * Add option to switch rtn/rtz for fmha fwd [ROCm/composable_kernel commit: `9fcf21a4ec`]	2025-06-10 15:03:23 +08:00
carlushuang	0ff21106d3	MUST USE INLINE FOR ANY NON TEMPLATE FUNCTION IN HEADER!!! (#2305 ) [ROCm/composable_kernel commit: `65835c0bbb`]	2025-06-10 10:40:54 +08:00
Aviral Goel	ee294c1736	Code Refactor for check_err.hpp (#2284 ) * refactor & add documentation * removed return datatype from doxygen comments * Update include/ck_tile/host/check_err.hpp Co-authored-by: John Afaganis <john.afaganis@amd.com> * Update include/ck_tile/host/check_err.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck_tile/host/check_err.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck_tile/host/check_err.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck_tile/host/check_err.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> --------- Co-authored-by: John Afaganis <john.afaganis@amd.com> Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> [ROCm/composable_kernel commit: `5a0bd157db`]	2025-06-08 13:41:27 -07:00
Sami Remes	24beb3bc6b	[CK_TILE] Tileloop persistent gemm - resubmit (#2299 ) * Reapply "[CK_TILE] Tile loop persistent gemm kernel (#2191)" (#2293) This reverts commit `0c8aea8cb4`. * Add missing header for kentry --------- Co-authored-by: Thomas Ning <Thomas.Ning@amd.com> [ROCm/composable_kernel commit: `1c6f83df6c`]	2025-06-06 14:18:49 -07:00
valarLip	06f87e3fb9	extend buffer load to support load 32 bf16/fp16 at same time (#2291 ) [ROCm/composable_kernel commit: `8482977a37`]	2025-06-06 17:21:19 +08:00
Andriy Roshchenko	72054549e7	Optimized GEMMs for MX FP4/8 (#2294 ) Adds V3 GEMM pipeline for MX FP4 and MX FP8 Adds V3 GEMM pipeline for MX FP4 with preshuffling Adds MXFP4 GEMM tests (#2275) Adds MXFP4 GEMM examples Adds MXFP4 GEMMs to ckProfiler Co-authored-by: Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com> Co-authored-by: Andriy Roshchenko <andriy.roshchenko@amd.com> Co-authored-by: aska-0096 <haocwang@amd.com> Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com> Co-authored-by: OscarXu <huaiguxu@amd.com> Co-authored-by: mtgu0705 <mtgu@amd.com> Co-authored-by: Ding, Yi <yi.ding@amd.com> Co-authored-by: feifei14119 <feiw@amd.com> Co-authored-by: Lin, Qun <qlin@amd.com> Co-authored-by: joye <joye@amd.com> Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> [ROCm/composable_kernel commit: `00247e3c29`]	2025-06-05 13:54:15 -06:00

1 2 3 4 5 ...

278 Commits