composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-01 20:27:42 +00:00

Author	SHA1	Message	Date
Ville Pietilä	99fe3df99a	Fix tensor descriptors.	2025-10-03 14:23:04 +00:00
Ville Pietilä	9510171377	WIP: Put back the generic tensor descriptors for convolutions.	2025-10-02 15:06:30 +00:00
Ville Pietilä	c3f0c1a866	Add additional check for non-supported c > 1 case.	2025-09-30 07:46:24 +00:00
Ville Pietilä	db835e065c	Make MPerGroup and NPerGroup template parameters.	2025-09-30 07:14:28 +00:00
Ville Pietilä	1a6f602c65	Remove debug code.	2025-09-30 05:53:28 +00:00
Ville Pietilä	193907fd85	Fix case k > 1 and c=1.	2025-09-29 16:02:00 +00:00
Ville Pietilä	558054eadb	WIP: Simplify conv to gemm transformations and handle K > 1 and C > 1 cases.	2025-09-26 13:38:24 +00:00
Ville Pietilä	8babf7195a	Fix strides in 1D conv to gemm transformation.	2025-09-26 09:38:11 +00:00
Ville Pietilä	354dd5039c	Add compile check for assumed row-mjor layout.	2025-09-26 08:39:39 +00:00
Ville Pietilä	1764c77fb2	Enable running multiple GEMM batches of merged conv groups.	2025-09-26 07:51:29 +00:00
Ville Pietilä	b864c077ed	Code clean-up for bwd tensor transformations.	2025-09-25 15:09:08 +00:00
Ville Pietilä	0ea3268d5d	Remove debug and other dead code.	2025-09-25 09:41:33 +00:00
Ville Pietilä	cc7433efc6	Add more comments, disable debug code.	2025-09-25 09:37:15 +00:00
Ville Pietilä	97f842f2c6	Fully functional LDS to global mem transfer using tensor descriptor and tile distribution encoding.	2025-09-25 09:30:50 +00:00
Ville Pietilä	625a78b17b	WIP: LDS to global mem transfer using CK tile tensor descriptor and tile distribution encoding.	2025-09-24 15:08:01 +00:00
Ville Pietilä	8048d6ff73	Fix build.	2025-09-23 11:17:08 +00:00
Ville Pietilä	e6f6c4a6a3	Working baseline for depthwise covolution with merged conv groups.	2025-09-23 11:14:10 +00:00
Ville Pietilä	29e3112b9b	Epilogue fixes.	2025-09-22 15:38:02 +00:00
Ville Pietilä	d7da3d5089	Offset fixes.	2025-09-22 15:37:46 +00:00
Ville Pietilä	7dfbac5d0b	WIP: Separate epilogue for merged conv groups.	2025-09-19 13:52:33 +00:00
Ville Pietilä	af6838e5dc	Integration test for CShuffle epilogue.	2025-09-19 12:09:08 +00:00
Ville Pietilä	7f52f84167	Fix tile window size for c block.	2025-09-19 08:08:19 +00:00
Ville Pietilä	6bcdb0947e	LDS to global memory copy.	2025-09-18 14:59:32 +00:00
Ville Pietilä	0e09504057	WIP: merged conv groups GEMM epilogue changes.	2025-09-17 14:25:02 +00:00
Ville Pietilä	4ec81cb95c	Add more logging.	2025-09-17 12:27:51 +00:00
Ville Pietilä	6d318ab481	Enable running multiple conv groups per batch.	2025-09-12 14:03:04 +00:00
Ville Pietilä	0d5c1b9638	WIP: Merged conv groups epilogue.	2025-09-11 15:24:36 +00:00
Ville Pietilä	970b40aa6c	WIP: Merged conv groups offset calculation.	2025-09-09 11:33:31 +00:00
Ville Pietilä	d9f0a9cdd0	Fully working conv group merging for TransformConvBwdWeightToGemm.	2025-09-09 09:58:43 +00:00
Ville Pietilä	8845b23254	WIP: Tensor transformations.	2025-09-08 15:41:54 +00:00
Ville Pietilä	61b3c96273	Add number of groups to merge to ck tile grouped gemm example.	2025-09-04 14:24:23 +00:00
Ville Pietilä	2b1908a375	Fix compilation of the grouped conv examples.	2025-09-04 12:01:49 +00:00
arai713	0282d98412	[CK TILE] Stream-K tile partitioner (#2708 ) * initial commit for skeleton code * replaced skeleton code with old streamk b2c map functions from old CK, still need to clean up the code * fixed up code to match CK Tile convention: data type changes, naming changes, etc. * change for num_sk_blocks data type * formatting fix * minor fixes * moved reduction argument to template * resolved comments from PR review: standardizing naming, pruning unneeded code * resolve errors from merge of device op PR: moved enum to common file * switching to uint32_t due to implementation constraints: divmod only takes uint32_t and mixing signed and unsigned types causes problems * unsigned type fix * add const qualifier * added documentation for template parameters * documentation edit	2025-09-03 13:38:17 -07:00
msaffari-amd	47d020a993	refactor: use snake_case naming in ck_tile/core components (#2766 )	2025-09-03 09:34:11 +02:00
Cong Ma	e1ab460d2d	[CK TILE GEMM] Fix building issues (#2772 ) - Add `WarpGemmMfma_f32_16x16x128_[fp8\|bf8]_[fp8\|bf8]_CTransposed` - Replace `__gfx950__` with `CK_GFX950_SUPPORT`	2025-09-02 22:40:18 -07:00
Po Yen Chen	9f35cde374	[CK_TILE] Fix fmha_fwd_v3() Default2DEpilogue usage (#2765 ) * Fix Default2DEpilogue usage * Fix Default2DEpilogue usage for batch_prefill	2025-09-02 09:51:56 -07:00
Michael Mcminn	022f369deb	Adding fix for the gfx908 to the GEMM MFMA implementaitons of WarpGem… (#2751 ) * Adding fix for the gfx908 to the GEMM MFMA implementaitons of WarpGemmMfmaBf16Bf16F32M4N64K16 WarpGemmMfmaBf16Bf16F32M64N4K16 * Adding support for offload target gfx9-4-generic * This duplication here isn't ideal	2025-09-02 10:35:07 +02:00
Haocong WANG	33418b201f	Fix naming issue (#2762 )	2025-09-02 11:18:53 +08:00
Po Yen Chen	d876e87fe4	[CK_TILE] Add FAv3 fwd pipeline (#2731 ) * Add FAv3 fwd pipeline * Unpack v_pk_mul to hide v_mov * Avoid compiler moving l compute across phase * Sync sched_group_barrier() setting for masking cases	2025-09-01 09:16:45 +08:00
Aviral Goel	fcff0043ae	chore(gemm): clang format to pass CI (#2758 )	2025-08-29 00:38:46 -07:00
Vijay Krish	4208e28988	ck_tile kernel for gemm with groupwise quantized B tensor. (#2663 ) * This change introduces new pipelines with Intrawave scheduler and block gemm primitives that loads the scale tensor to registers to perform dequantization post MFMA on C tensor in registers. Scale tensor data, BQ is spliced across threads in registers and not stored in LDS. Current support is for the following combinations, but it should be fairly straightforward to extend support to more formats. fp8, fp8 -> f32 bf8, bf8 -> f32 fp8, i4 -> f32 bf8, i4 -> f32 Group size can go down to as low as K length of underlying WarpGemm primitive. * Solve merge conflict * [CK TILE] Update CHANGELOG.md --------- Co-authored-by: Vijay Krishnamoorthy <vjkrish@fb.com> Co-authored-by: ThomasNing <thomas.ning@amd.com> Co-authored-by: Cong Ma <congma13@amd.com>	2025-08-28 23:43:02 -07:00
Cong Ma	428090f749	Support transposed C tile in Aquant (#2679 ) The performance of Aquant has increased after enabling transposed C. Do not need to exchange AQ elements among lanes after enabling transposed C as one thread only holds data from one row.	2025-08-28 13:28:09 -07:00
Mateusz Ozga	0758883fa4	[CK-TILE] Default2DEpilogue, example and adding nullptr_t type for D (#2752 ) * Init commit * Quick fix, CI fails * Remove CDElementWise * Add CDEELementWise --------- Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2025-08-28 12:45:50 -07:00
asleepzzz	038ea82315	Revert "[CK_TILE] FMHA BWD Enable Tile 16x192 (#2741 )" (#2757 ) This reverts commit `ead4447b20`.	2025-08-28 22:50:42 +08:00
linqunAMD	4a49dac7c6	[Regression] Fix CK_TILE build error in grouped_convolution, copy_basic and fused_moegemm_kernel (#2728 ) * fix copy basic build error * fix other ck tile test build error	2025-08-28 20:30:30 +08:00
Yi DING	ead4447b20	[CK_TILE] FMHA BWD Enable Tile 16x192 (#2741 ) * 16x192 * Use buffer_load_lds for lse/d * Dispatch & cleanup * Avoid zeroing dq & fix * fix	2025-08-28 18:54:18 +08:00
Linjun-AMD	bf7b458e6e	use iglp to improve dim256 fmha fwd in qr_ks_vs pipeline (#2711 ) * add k_lds padding and iglp to improve dim256 fmha fwd * Update include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs.hpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * update block_fmha_pipeline_qr_ks_vs.hpp Signed-off-by: JL-underdog <Jun.Lin@amd.com> * Update block_fmha_pipeline_qx_ks_vs_custom_policy.hpp * clang format Signed-off-by: JL-underdog <Jun.Lin@amd.com> * use same naming style --------- Signed-off-by: JL-underdog <Jun.Lin@amd.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-08-28 11:39:39 +08:00
Cong Ma	cd53e2e57e	[CK TILE GEMM] Fix a merge conflict (#2753 ) * Fixed a merge conflict in `245467f3` * Foramt the code	2025-08-27 11:08:09 -07:00
Cong Ma	245467f359	[CK TILE] Fix bugs in AQuant preshuffle (#2700 ) * [CK TILE] Fix bugs in AQuant preshuffle - Make Aquant works with block Mx64x256. `M` could be 16, 32, 64 - Make Aquant works with warp 16x16x32 and 32x32x16. * [CK TILE] Rename Preshuffle to PreshuffleQuant The new name, PreshuffleQuant, explicitly states the function's purpose: to preshuffle the quantization matrix. * [CK TILE Block Scale] Use GemmConfig to save tile properties - Remove specialization of GemmQuantTypeConfig - Pass GemmConfig around which contains tile properties. Stop using hard coded tile properties in `gemm_calc_aquant()` * [CK TILE Block Scale] Rename GemmConfig used in block scale - Remove unused GemmConfig - Rename GemmConfig used in block scale --------- Co-authored-by: ThomasNing <thomas.ning@amd.com>	2025-08-27 00:05:54 -07:00
John Afaganis	508e7912f9	Revert "[CK-TILE] Default epilogue, adding support for D (#2629 )" (#2746 ) This reverts commit `d43228fbca`.	2025-08-26 09:48:49 -07:00

1 2 3 4 5 ...

308 Commits