composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-30 19:57:40 +00:00

Author	SHA1	Message	Date
Tianxing Wu	0d2a9badba	fixed example	2025-10-23 11:17:46 +00:00
Juuso Korhonen	3c0e6d37bf	fixing bugs	2025-10-23 09:47:30 +00:00
Juuso Korhonen	e144872308	change to BLOCK_M in shape definitions	2025-10-23 08:11:55 +00:00
Tianxing Wu	f72b994b00	More compilation fixes	2025-10-20 15:53:35 +00:00
Juuso Korhonen	d68a541c19	fixing compile errors...	2025-10-20 15:04:47 +00:00
Juuso Korhonen	97e7527eb1	fixing compile errors...	2025-10-20 14:03:15 +00:00
Tianxing Wu	9fda954253	Compiling fix	2025-10-20 13:16:19 +00:00
Tianxing Wu	995c6701d3	Merge branch 'tianxing/unified-attention' of https://github.com/ROCm/composable_kernel into tianxing/unified-attention	2025-10-17 09:05:12 +00:00
Tianxing Wu	af9167abad	example	2025-10-17 09:05:10 +00:00
Juuso Korhonen	9940bd07f6	fix order in mask caller	2025-10-16 11:23:46 +00:00
Juuso Korhonen	072de3842f	comment	2025-10-16 09:23:39 +00:00
Juuso Korhonen	aa4908ac14	fix mask	2025-10-16 09:18:38 +00:00
Juuso Korhonen	62932576c4	use correct mask in kernel	2025-10-16 09:02:08 +00:00
Juuso Korhonen	498a97aa1d	merge	2025-10-16 08:57:14 +00:00
Juuso Korhonen	63c17b7236	correct masking by transforming y_idx = y_idx / num_queries_per_kv	2025-10-16 08:54:07 +00:00
Tianxing Wu	853fa21566	Example boostrap	2025-10-15 11:58:44 +00:00
Juuso Korhonen	72fe8b311c	merge	2025-10-14 12:35:33 +00:00
Juuso Korhonen	4d232d59cc	fix seq_len -> cur_batch_query_len	2025-10-14 12:34:33 +00:00
Tianxing Wu	b940a75328	Comments	2025-10-14 12:19:20 +00:00
Tianxing Wu	ec29289bb1	kv paging	2025-10-14 12:04:11 +00:00
Tianxing Wu	c87f2e3ca9	o window change	2025-10-14 09:59:47 +00:00
Tianxing Wu	96b208f6c7	Merge branch 'tianxing/unified-attention' of https://github.com/ROCm/composable_kernel into tianxing/unified-attention	2025-10-14 09:58:30 +00:00
Tianxing Wu	e1120fffb0	pipeline api	2025-10-14 09:58:27 +00:00
Juuso Korhonen	c3d27abfb8	fix q window	2025-10-14 09:49:54 +00:00
Juuso Korhonen	b37c356090	fix q window origin	2025-10-14 09:36:28 +00:00
Tianxing Wu	6a7fa959b7	kv tensor view and initial window	2025-10-13 12:53:43 +00:00
Tianxing Wu	cd354286c1	Merge branch 'tianxing/unified-attention' of https://github.com/ROCm/composable_kernel into tianxing/unified-attention	2025-10-13 11:32:30 +00:00
Tianxing Wu	be58d51d36	o ptr and window	2025-10-13 11:32:28 +00:00
Juuso Korhonen	6ba25b7e84	add commenting	2025-10-13 10:34:55 +00:00
Juuso Korhonen	81a02ffb40	Merge branch 'tianxing/unified-attention' of https://github.com/ROCm/composable_kernel into tianxing/unified-attention	2025-10-13 10:30:22 +00:00
Juuso Korhonen	b721f79f99	fix	2025-10-13 10:30:11 +00:00
Tianxing Wu	16129a794a	stride fix	2025-10-13 10:30:08 +00:00
Tianxing Wu	96fde33ec4	Merge branch 'tianxing/unified-attention' of https://github.com/ROCm/composable_kernel into tianxing/unified-attention	2025-10-13 10:29:07 +00:00
Tianxing Wu	55fc6d7151	kv tensor view	2025-10-13 10:28:02 +00:00
Juuso Korhonen	af94aaf1cb	refactor the q tensor view transformation	2025-10-13 10:22:52 +00:00
Juuso Korhonen	49ce980c67	Merge branch 'tianxing/unified-attention' of https://github.com/ROCm/composable_kernel into tianxing/unified-attention	2025-10-13 10:21:27 +00:00
Juuso Korhonen	2d6dab29eb	refactor the q tensor view transformation	2025-10-13 10:18:23 +00:00
Tianxing Wu	36a65b1968	refactor	2025-10-13 10:05:23 +00:00
Tianxing Wu	bc6385f389	Some refactor	2025-10-13 10:01:38 +00:00
Tianxing Wu	1f4648dab5	refactor. and fixed q transformation	2025-10-10 15:27:36 +00:00
Tianxing Wu	df60493219	refactor	2025-10-10 13:25:19 +00:00
Juuso Korhonen	436eb3a4f8	transform q tensor view	2025-10-10 12:08:16 +00:00
Tianxing Wu	191f179038	unified attention rename	2025-10-09 08:47:19 +00:00
Tianxing Wu	e54cb5a713	intial commit	2025-10-06 13:02:38 +00:00
Sami Remes	ef43078788	Use __builtin_amdgcn_readfirstlane for buffer resource in fused_moe (#2893 ) * Use __builtin_amdgcn_readfirstlane for buffer resource in fused_moe * also do the same for amd_buffer_addressing_builtins.hpp * merge with develop * fix clang format --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by: ThomasNing <thomas.ning@amd.com> Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>	2025-09-30 15:12:30 -07:00
joyeamd	b60af5bde9	[CK_TILE]enhance elementwise test (#2683 ) * enhance elementwise * fix ci issues	2025-09-30 08:29:37 -07:00
Aviral Goel	bebf0e9d15	Extend Grouped GEMM with MultiD (Single & Double Shared Memory) feature to use persistent kernel option (#2933 ) * feat(grouped_gemm_multi_d): add new example that integrates grouped_gemm and multi_d_gemm feature * refactor: grouped_gemm_multi_d relies on grouped_gemm_kernel * tests(grouped_gemm): grouped_gemm test suite passes with minor adjustments * fix: segfault fix by passing correct parameters for d tensors * style: clang format * WIP: host code for grouped_gemm_multi_d persistent kernel compiles but segfaults * feat(grouped_gemm_multi_d): add functionality to run persistant kernel * feat(grouped_gemm_multi_d): add new example that integrates grouped_gemm and multi_d_gemm feature * refactor: grouped_gemm_multi_d relies on grouped_gemm_kernel * tests(grouped_gemm): grouped_gemm test suite passes with minor adjustments * fix: segfault fix by passing correct parameters for d tensors * style: clang format * fix: incorrect validation method and Dtensor layout in test suite * docs: improved README text based on review comments * fix: parameterize NumDTensor in GroupedGemmHostArgs and remove lint	2025-09-29 15:03:56 -07:00
Khushbu Agarwal	81458a6681	Weight Preshuffle Block Scale gemm support (#2877 ) * initial commit * remove extra files * fixing errors * updated ReadMe file for mapping of diff quants with diff configs * addressing review comments * addressing review comments * Resolved merge conflicts * [CK TILE GEMM] Replace get_preshuffle_or with is_quantpreshuffle_enabled The get_preshuffle_or was not working as expected, which led to incorrect behavior in the quantization preshuffle process. This change replaces it with the more reliable is_quantpreshuffle_enabled function to properly determine when preshuffle should be applied. * initial commit * debugging * working fp8 for init constant * fp8 working with all inits * updated block level code with comments * changing the loop iter * debugging * debugging * debugging * code fix * code clean up * clang formatted * Add comment * code cleanup * clang formatted * merge conflicts fixes * applying the latest int4 changes to the piepline * fixing test code for updated traits * Adding gtest * review comments addressed * addressing review comments * remove c++20 code * added flush cache changes --------- Co-authored-by: Cong Ma <congma13@amd.com> Co-authored-by: root <root@banff-cyxtera-s73-2.ctr.dcgpu>	2025-09-29 12:46:37 -07:00
carlushuang	2e9428eb63	hot fix check eid range (#2924 ) * hot fix check eid range * fix clang format --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>	2025-09-29 09:38:38 -07:00
yinglu	0f04f020d9	fix:tf32:fix build fail for all supported targets (#2942 ) * fix:tf32:fix build fail for all supported targets * new fix code	2025-09-29 08:04:11 -07:00

1 2 3 4 5 ...

1143 Commits