composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-02 04:31:25 +00:00

Author	SHA1	Message	Date
Mateusz Ozga	b507d889c1	[CK_TILE] Introduces a new GEMM API that splits the existing basic GEMM class into multiple specialized classes. (#2520 ) * Init commit new API * apply clang-format * PreShuffle preapring * Apply Preshuffle condition to universal_gemm * Fix: convert size_t to index_t * Review changes * Mode 100755 -> 100644 --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2025-07-24 20:39:56 +02:00
Cong Ma	e62710e461	ck_tile kernel for gemm with groupwise quantized A tensor (#2473 ) * ck_tile kernel for gemm with groupwise quantized A or B tensor. This change introduces new pipelines with Intrawave scheduler and block gemm primitives that loads the scale tensor to registers to perform dequantization post MFMA on C tensor in registers. Scale tensor data, AQ/BQ is spliced across threads in registers and not stored in LDS. Current support is for the following combinations, but it should be fairly straightforward to extend support to more formats. 1. fp8, fp8 -> f32 2. bf8, bf8 -> f32 3. i4, fp8 -> f32 4. i4, bf8 -> f32 Group size can go down to as low as K length of underlying WarpGemm primitive. For Gemm problems with quantized B tensor, this change also introduces preliminary support for flatmm pipeline which loads B tensor directly into registers. * [Block Scale Gemm] Only run gemm quant examples on __gfx94__ - Only run gemm quant examples on __gfx94__ for usage of `v_cvt_pk_fp8_f32` - Format the code * [Block Scale Gemm] Remove Bquant Gemm BlockScale This cleanup is in preparation for future development of bquant. By isolating Aquant-related code, we can streamline the codebase and make it easier to add and maintain bquant functionality in subsequent updates. * [Block Scale Gemm] Format code with clang-format-12 The latest clang-format (v19) in ROCm 7.0 generate different result than clang-format-12 which is used in CK CI. Format code with clang-format-12 for consistency. * [Block Scale Gemm] Split the k direction loop - Split the k direction loop in block_universal_gemm_as_quant_bs_cr.hpp to make the logic clearer. - Disable C transposition. * [Block Scale Gemm] Move block scale gemm example to 38_block_scale_gemm * [Block Scale Gemm] Update copyright * test * Add TailHandler * Move TileDistributionEncodingPatternAQ * Refactor * refactor * fix bug * fix bug * help solve the PR comment * Format the code * [Block Scale Gemm] Add unit tests * [Block Scale Gemm] Add support to 16x16x32 MFMA - Add support to 16x16x32 MFMA - Fix a bug when exchange data crossing lanes --------- Co-authored-by: Vijay Krishnamoorthy <vjkrish@meta.com> Co-authored-by: Cong MA <congma13@ctr2-alola-ctrl-01.amd.com> Co-authored-by: ThomasNing <thomas.ning@amd.com>	2025-07-23 00:10:16 -07:00
Yi DING	f0a8c18017	[CK_TILE] Fix tile_example_moe_sorting broke in #2436 (#2525 )	2025-07-17 22:50:58 -07:00
Mateusz Ozga	7fc000d7b3	Fix CI clang-format (#2521 )	2025-07-17 14:41:29 +02:00
Haocong WANG	28072adc3a	fix mfma32x32 dispatch (#2490 )	2025-07-17 15:24:12 +08:00
Yi DING	f1d8ad2818	[CK_TILE] Use read_tr in universal gemm (#2436 ) * Use read_tr in universal gemm * Enable all instances back * Revert example37 changes * Resolve comments * resolve comments 2 * Fix assertion msg * fix the gemm basic * change index_t to bool for preshuffle variable * Solve the comment --------- Co-authored-by: Thomas Ning <Thomas.Ning@amd.com> Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com> Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>	2025-07-16 23:56:22 -07:00
linqunAMD	6e76b82059	Fix build errors on windows (#2456 ) * Fix build errors on windows * correct clang format --------- Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com>	2025-07-16 07:58:23 -07:00
Khushbu Agarwal	d239b91fd5	Merge flatmm Operator with universal gemm (#2434 ) * Initial commit * Adding new tile partitioner to flatmm * intermediate changes * debugging kernels * Updating flatmm example to universal gemm example * updated flatmm kernel to run via gemmKernel * update universal gemm to incorporate flatmm * debug * Fix flatmm call * Fixing other kernels and tests for API changes * clang formatted * fixing gemm tests * added test for flatmm and simplify kernel arguments * adding flatmm test * fix test for flatmm * simplify gemm kernel with flatmm * remove flatmm related files * addressing review comments and code clean up * resolving empty file * resolving empty file * clang formatted * addressing review comments * enable persistent kernel for flatmm * reverted the removed files for flatmm * reverted the removed files for flatmm * changed flatmm to weightPReshuffle; removed the _1 added in teh faltmm example * some more renames * clang formatted	2025-07-11 08:27:55 -07:00
Thomas Ning	f240ae3248	Enable Async Copy for MI355 (#2425 ) * add for async load builtin * add async load api * fix some compiling errors * fix a compiling error * fix some compiling errors * add a pipeline which copies from v4 * add a new pipeline for async load * fix some compiling errors * add async load tests * fix some issues in async load * fix * fix async inline assembly * fix async inline assembly * add ignore header file * comment some not gfx950 codes * comment some not gfx950 codes * fix a error * update async load apis * fix lds descriptor * fix a compiling error * fix some compiling errors * fix a descriptor issue * update lds descriptor * change async pipeline's tile distribution pattern from thread to warp * fix clang format * update async policy * fix a CRTP issue * fix a typo error * change lds layout * fix some sync issues * improve codes * delete the async test * fix a commented format issue * avoid compiling device functions when compile host * make gemm run * add the copy kernel support * finish the feature * Address comment * add the support for buffer_builtin * solved the merging problem * Comment Addressed --------- Co-authored-by: joye <joye@amd.com> Co-authored-by: joyeamd <John.Ye@amd.com>	2025-07-07 10:08:49 -07:00
ltqin	9f4c5d7372	ck tile pagedkv prefill (#2405 ) * add prefetching physical block id for pagedkv * start add pagedkv prefill * rename pipeline * add kernel for pagedkv * add an init version pagedkv prefill * fix redefine issue * add struct BlockFmhaFwdPagedKVPipelineProblem and fmha_fwd_pagedkv_args * generate dispatch code * add body generating code * comipling pass * remove dropout from pagedkv * set lse to false in generating code * start changing qr kernel to pagedkv * init version of kernerl with pagedkv * change names of file that are generated * chang host validation for pagedkv prefill * using iglp to change blockgemm * add kernel files to op head file * show parameters * rewrite print parameter fun * add fwd * remove default parameter of GridSize * format * fix nhead issue and add seqlen_k_ptr to batch mode * format code * remove no-longer used code * format * fix some comments --------- Co-authored-by: ltqin <letaoqin@amd.com> Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>	2025-07-07 16:16:54 +08:00
Thomas Ning	e03293ebce	[CK Tile] Int8 Support on CK Tile GEMM (#2267 ) * updates to support int8 in 03_gemm example * added comments, using aliases, helper functions * test(gemm_universal): add test cases for int8 gemm pipeline * fix(test_gemm): fix for failing test unit test for int8 * test(ck_tile): add int8 unit test for gemm universal * refactor(gemm_universal): GPU reference verification for GEMM code improved * style(gemm_universal): removed extra comments and did clang format * merging recent changes to universal gemm to tile_engine * ck tile engine integration work * feat(tile_engine): add int8 support to tile engine ops/gemm * feat(tile_engine): added 32 32 16 mfma instances to tile engine for int8 * style: Format code with clang-format-12 * refactor(tile_engine): address review comments * style: removed unhelpful comments & unused variables. * build: tile engine uses default config * feat: add int8 support for CK_TILE GEMM * style: added trailing commas to codegen_utils.py * refactor: tile engine * refactor: formatting and code review * refactor: code formatting for python files * fix: suppress build warning * add support for gfx950 * refactor:KWarpTile size in gemms util * Fix the branch and wrap up the k warp tile * Add bf8 integration * refactor: clang format and rebase --------- Co-authored-by: zjli2013 <leezhengjiang@gmail.com> Co-authored-by: AviralGoelAMD <aviral.goel@amd.com> Co-authored-by: Khushbu Agarwal <khuagarw@amd.com>	2025-06-25 08:20:35 -07:00
linqunAMD	37e1a27537	[CK_TILE] Refine fp8 support in flatmm (#2239 ) * [CK_TILE] Refine fp8 in flatmm 1. Replace USING_MFMA_16x16x32 & USING_MFMA_16x16x32 with constexpr 2. Add an additional const check to avoid build error in HotLoopScheduler 3. Refine shuffleb to support both tile 32x32 and 16x16 4. Support command option -init 5. Move Gemm warp defintion to a separate struct * fix clang format * fix clang format * keep default bhavior unchanged (warp tile = 16x16) * fix tile engine build error * fix a typo in codegen_utils.py * address review comments * address review comments --------- Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2025-06-25 01:07:45 -07:00
Yi DING	c5d9181e1b	Fix unmatched K size of WarpGemmMfmaBf16Bf16F32M16N16K32TransposedCDistribution on gfx950 (#2393 )	2025-06-24 16:35:54 -07:00
Bartłomiej Kocot	cebdee4d9e	[CK TILE] Grouped Convolution Forward Kernel (#2188 ) * [CK TILE] Grouped Convolution Forward Kernel * custom vector size * fixes * refactor * rebase fixes * fixes * fixes	2025-06-20 15:44:36 -07:00
linqunAMD	0eb8974502	[CK_TILE] Support multi-config in tile_example_gemm_universal (#2240 ) * [CK_TILE] Support multi-config in tile_example_gemm_universal Add GemmConfig in run_gemm_example to support multiple tile config. - It is useful when use you need compare gemm perf with different tile/pipeline config - we also can use it simplify the code for wmma support in the furture. * [CK_TILE] Support multi-config in tile_example_gemm_universal Address review comments * rebase code and fix clang format. * fix clang format * support pipeline v5. * fix merge conflict * address review comment * add missing file * address review comment v2 * fix build error	2025-06-17 17:27:46 -07:00
Thomas Ning	3c4cdfac4f	Fix the CK Tile related operators (#2356 ) * fix the flatmm * Fix the pipeline * address the comment	2025-06-16 17:38:52 -07:00
carlushuang	fb97f75099	hot fix block_gemm fail with pipeline_problem by adding NumWaveGroups inside block gemm problem (#2348 )	2025-06-15 22:49:04 -07:00
Mateusz Ozga	bd96ac9742	[CK_TILE] Multiple-D GEMM example (#2219 ) * Multiple d, initial commit * Check Ds Layout * Readme and clang format * Update branch & conflicts * Multiple D - fix clang-formatter * Rename elemetwise_op * Fix CI * Code review part1 * Remove printf * Remove unnecessary comment * Add new tests with Col layout * Review part 2 * Added support for Multiple D GEMM * Update comment * Remove maybe_unused * Clang-format * Review part 3 * Add comment to function * Add comment to function: another * Take number of params for a refrence function * Remove additional d param for 0 tensor * Change name of function * Fix CI fails	2025-06-13 19:39:11 +02:00
kylasa	5f1ad09b61	Code drop for 2 warp ping pong scheduler along K dimension. (#2276 ) * Code drop for 2 warp ping pong scheduler along K dimension. * Addressing code review comments. * Addressing Clang formatting issues. * Addressing build issues. * Addressing build issues of other GEMM pipelines with ping pong scheduler code drop. * Fix for LDS memory size for GEMM pipelines. * Addressing code review feedback comments. * Change log update. * Addressing code review comments and build issues. * Added new policy for pipeline specific logic about LDS needs. * Clang Fix during build.	2025-06-12 18:24:02 -07:00
Thomas Ning	14d229d6c8	fix on the typo (#2326 )	2025-06-10 16:34:33 -07:00
Sami Remes	1c6f83df6c	[CK_TILE] Tileloop persistent gemm - resubmit (#2299 ) * Reapply "[CK_TILE] Tile loop persistent gemm kernel (#2191)" (#2293) This reverts commit `233e274077`. * Add missing header for kentry --------- Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2025-06-06 14:18:49 -07:00
Illia Silin	233e274077	Revert "[CK_TILE] Tile loop persistent gemm kernel (#2191 )" (#2293 ) This reverts commit `ffb52783d0`.	2025-06-05 09:24:00 -07:00
Sami Remes	7ea1508b59	[CK_TILE] Move GEMM pipeline tail handling logic to pipelines (#2222 ) * Add TailHandler for V3, V4 and Mem pipelines * Adapt examples and tests to use TailHandler * move tail-handling logic to pipeline in persistent grouped gemm * Fix Mem pipeline dispatching, add CompV4 dispatching * Use a macro for handling the many tails of Mem pipeline * Fix formatting again * Use const-ref RunFunction, remove unnecessary try_run	2025-06-04 11:50:21 +03:00
Sami Remes	ffb52783d0	[CK_TILE] Tile loop persistent gemm kernel (#2191 ) * Implement tile loop persistent gemm kernel * Enable timing * Add tests for persistent gemm * Fix formatting * Fix gemm_basic * Rename True/False to Persistent/NonPersistent * Use only one set of layouts for persistent tests * Fix gemm example persistent template parameter * Fix formatting	2025-06-04 11:46:28 +03:00
Sami Remes	d1e6f0982d	[CK_TILE] Grouped GEMM tile loop (#2146 ) * Add trait to use a persistent kernel and split the entrypoints in grouped gemm * Some helper functions for persistent kernel case * Get max occupancy grid using device properties * Implement tile loop in main entry point to grouped gemm * Enable GridSize() on device * Handle offset tile index using real current block index * Add persistent kernel choice to grouped gemm example * Use a for-loop for iterating over the group * Reduce VGPR spills by early-exit * Enable persistent kernel choice in grouped_gemm example * Add persistent kernel option to grouped_gemm test * Fix formatting with remod.py * Remove GridUpdateBlocks as blocks are now iteratively computed * Add comment about VGPR spilling * Fix formatting * Use CK_TILE_HOST instead of __host__ * Enable all Row/Col combinations in grouped gemm unit test * Add some KBatch=2 cases to grouped gemm tests * Fix SplitK for grouped gemm * Enable pipeline hotloop/tailnumber selection in-kernel for grouped gemm * Add type traits * Split examples to regular and tileloop * Formatting * Use hipExtStreamGetCUMask to get current active CUs for the given stream * Align test and example kernel config, and disable validation for splitk repeats * Remove debug options from CMakeLists.txt * Separate the code paths for persistent/non-persistent in test * Fix formatting * Address review comments --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2025-05-20 17:18:57 +03:00
Khushbu Agarwal	3d8d6e75e4	Adding validation for tile sizes in Tile Engine (#2189 ) * Adding validation for tile sizes * Add architecture in config, and shuffle lines of code in warp_gemm.hpp * Enable MFMA for gfx950, and invalid tile handling	2025-05-15 10:28:31 -07:00
Khushbu Agarwal	f05e45ba59	Disable SMFMA gfx90a (#2184 ) * sparsity fix for gfx90a * reverting tile_engine changes	2025-05-12 09:56:23 -07:00
Khushbu Agarwal	d8faf1c6a1	Support for swizzle and transpose for MFMA_16x16x32_F16/BF16 (#2172 ) * Changes for updating tile distribution for shuffle and transpose * Fixed swizzle and transpose, removed comments * clang formatted * Adding support for bf16 type * Addressing review comments	2025-05-10 22:40:05 -07:00
Khushbu Agarwal	ef72a4b9bc	Disable SMFMA for gfx90a (#2182 )	2025-05-09 00:18:07 -07:00
Thomas Ning	c757046d49	Revert "Disable the SMFMA instruction for gfx90a. (#2174 )" (#2175 ) This reverts commit `a32d907771`.	2025-05-08 00:07:03 -07:00
Khushbu Agarwal	a32d907771	Disable the SMFMA instruction for gfx90a. (#2174 ) * remove smfma for gfx90a * clang formatted	2025-05-07 23:09:22 -07:00
BingYuan.Zhou	6a3960c1e1	Flatmm merge (#2168 ) * sync with function interface of cshuffleepiloge,fix flatmm build fail * move code from solin/flatmm which add mfma161632fp8 and optimize flatmm --------- Co-authored-by: solin <bingzhou@amd.com>	2025-05-08 12:59:57 +08:00
Aviral Goel	769336b640	[CK_TILE] Add type traits to detect tile window types at compile time (#2158 ) * added WindowType enum to tile_window_structs and static assert checks in computev4 pipeline * added type traits instead of enum to tile_window() and tile_window_linear() with debug comments * removed comments, added documentation and clang format	2025-05-07 00:00:39 -07:00
jakpiase	0bcb804ad0	[CK_TILE] Remove scratch usage from universal gemm (#2001 ) * moves kbatch condition outside of kernel * add reviewer comments * fixes * fix tests * fixes after review --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2025-05-05 18:46:44 +02:00
Khushbu Agarwal	d58f2b8bd0	mfma_32x32x64_fp8/bf8 (#2148 ) * support for mfma_32x32x64_fp8 * clang-formatted * Fixing sparsity in codegen	2025-05-01 13:36:24 -07:00
Illia Silin	9a9f59ae69	Revert "Add ck tile examples to package (#1880 )" (#2150 )	2025-04-30 10:20:16 -07:00
Aviral Goel	65f182d617	Add Matrix A and Matrix B Swizzle for LDS in Computev4 policy (#2136 ) * fixed computev4 policy bug for lds swizzle * added swizzle for input matrix B * Improved ComputeV4 policy and pipeline by swizzling A and B * consolidated LDS descriptor functions in parent struct	2025-04-28 18:20:47 -07:00
Khushbu Agarwal	d107f3c3a5	Support for MFMA_16x16x128 for fp8/bf8 (#2125 ) * Adding 16x16x128 support for gfx950 * Support for fp8 and bf8 * fix input arguments for MFMA scale instruction * clang-formatted * Fixes for lwpck-3145 (#2138) * Fix lds tile & cmake dep & default epilogue * Fallback BTypeToUse to ADataType in WOQ cases * reverting instance json file * reverting instance json file --------- Co-authored-by: Yi DING <yi.ding@amd.com>	2025-04-28 18:19:50 -07:00
jakpiase	434d19f696	Add ck tile examples to package (#1880 ) * add ck tile examples to package * Update jenkinsfile * fix for jenkinsfile * fix for building ck tile code on non gfx9 * compile ck tile examples only for gfx94 * include ck tile examples in all target * fix for basic gemm UseStructuredSparsity * Update CMakeLists.txt * Update gemm_pipeline_problem.hpp * add targets to rocm install --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2025-04-28 09:53:19 -07:00
Khushbu Agarwal	a2ed34a112	MFMA_32x32x16 for gfx950 (#2121 ) * Enable MFMA_32x32x16 for fp16/BF16 for gfx950 * clang formatted	2025-04-24 10:20:22 -07:00
carlushuang	5487289fc4	[CK_TILE] support gfx950 matrix core in 01_fmha fwd (#2110 ) * gfx950 01_fmha fwd * fix comment --------- Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2025-04-23 12:40:18 -07:00
Gino Lu	504f563f78	[CK-Tile] warp-gemm support for using V_MFMA_F32_16x16x32_BF16 (#2073 ) * draft v_mfma_f32_16x16x32_bf16 * fix error config and add debug code. * Solve the CShuffle Problem * draft v_mfma_f32_16x16x32_bf16 * fix error config and add debug code. * Solve the CShuffle Problem * fix error while testing new command * Finished the feature of new mfma 161632 * Addressed the comment --------- Co-authored-by: ThomasNing <thomas.ning@amd.com>	2025-04-22 15:52:36 -07:00
Thomas Ning	a738e43445	MFMA 16x16x32fp8 (#2103 ) * add mfma_16x16x32_fp8 * clang format code * Finished the fix for gemm basic * clang foramt * rebuild CI * recover gemm.hpp * add MFMA 161632bf8 --------- Co-authored-by: solin <bingzhou@amd.com>	2025-04-21 10:21:35 -07:00
jakpiase	6c61f4d237	[CK_TILE] Add 2:4 structured sparsity support for fp16 gemm (#1957 ) * add structured sparsity fp16 support for gemm * added reviewer suggestions * update changelog * update changelog * add reviewers suggestions * Minor fix * clang fix * fix doxygen	2025-04-11 12:18:26 +02:00
Illia Silin	572cd820ce	Split env.hpp header from the ck.hpp header. (#2049 ) * split env.hpp out of main headers * fix namespace logic	2025-04-03 15:30:21 -07:00
Adam Osewski	e5ad48a784	Basic docs for universal gemm & ck-tile gemm. (#2014 ) * Basic docs for universal gemm & ck-tile gemm. * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Reviewers suggestions. * Align tparam names in doc with class tparams. * More reviewers fine tuning ;) --------- Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>	2025-04-02 11:03:40 +02:00
MHYang-gh	c027637a8f	Fix A/B lds transform (#2007 )	2025-03-22 23:13:50 -07:00
BingYuan.Zhou	5a0d693b86	fix ck_tile/basic_gemm build error (#1988 )	2025-03-20 22:01:14 -07:00
jakpiase	0e91d32c61	[CK_TILE] Switch to universal gemm for batched and grouped gemms (#1919 ) * switch to universal gemm for batched and grouped gemms * added reviewer comments * fixed grouped gemm tests	2025-03-20 11:17:04 +01:00
kylasa	66c5f5b0b6	Addressing (Post Merge) code review comments for PR 1845 (#1883 ) * Addressing code review comments. * Addressing code review comments. * Reorganized code for better readability. * add ck_tile gemms for new types in CI * fix jenkins syntax * fix script syntax * Add the test cases back * Address the review comments * Address review comments * clang format * Solve the merging issues * Addressed the comments * clang format --------- Co-authored-by: illsilin <Illia.Silin@amd.com> Co-authored-by: ThomasNing <thomas.ning@amd.com> Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2025-03-06 11:40:30 -08:00

1 2

93 Commits