composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-06 07:51:52 +00:00

Author	SHA1	Message	Date
PoYen, Chen	3850028bca	Make comment more clear	2024-08-20 14:10:16 +00:00
PoYen, Chen	8745f5f8bb	Fix kvcache seqlen_k generating logic	2024-08-20 14:04:03 +00:00
PoYen, Chen	6230a78c9e	Find executable from folder automatically	2024-08-20 07:54:37 +00:00
PoYen, Chen	e44852bf2a	Re-format script	2024-08-20 06:02:34 +00:00
PoYen, Chen	eb1c8a26fa	Merge branch 'develop' into feature/fmha-fwd-appendkv	2024-08-20 05:49:01 +00:00
PoYen, Chen	60fe251b78	Revert "Only randomize kvcache seqlen_k if 1 < batch" This reverts commit `b9a4ab0d7e`.	2024-08-19 20:10:36 +00:00
PoYen, Chen	b9a4ab0d7e	Only randomize kvcache seqlen_k if 1 < batch	2024-08-19 09:28:52 +00:00
PoYen, Chen	8166aa58aa	Fix wrong uneven split checking logics	2024-08-19 08:14:45 +00:00
PoYen, Chen	3f0dab6a77	Revert "Avoid seqlen_k=0 for kvcache" This reverts commit `21c4df89e4`.	2024-08-19 07:50:50 +00:00
PoYen, Chen	21c4df89e4	Avoid seqlen_k=0 for kvcache	2024-08-19 07:13:57 +00:00
PoYen, Chen	3fb77a0ebb	Remove type argument	2024-08-18 19:29:40 +00:00
PoYen, Chen	f37cd416e3	Fix typo in comment	2024-08-18 19:07:36 +00:00
PoYen, Chen	e8cd975d6a	Fix compilation errors	2024-08-18 19:00:05 +00:00
PoYen, Chen	9d5c33da04	Add more comments	2024-08-18 18:56:37 +00:00
PoYen, Chen	90c2008fe5	Add comment	2024-08-18 18:42:31 +00:00
PoYen, Chen	8a856f57ab	Add seqlen_q & seqlen_k rules	2024-08-18 18:38:08 +00:00
PoYen, Chen	a93c5e820f	Rename parameter	2024-08-18 18:37:25 +00:00
PoYen, Chen	4cd3432361	Avoid using too small rotary_cos & rotary_sin	2024-08-18 18:27:37 +00:00
PoYen, Chen	e5db71cc59	Use randomized seqlen_k for kvcache	2024-08-18 17:42:32 +00:00
PoYen, Chen	996f46b0d1	Randomize seqlen_k if use kvcache	2024-08-18 17:31:51 +00:00
PoYen, Chen	48b7a5bad2	Fix mode overriding logics	2024-08-18 14:44:28 +00:00
PoYen, Chen	05157bf3a3	Force batch mode when invoking appendkv & splitkv apis	2024-08-18 06:05:42 +00:00
PoYen, Chen	cc52587bcc	Remove macro checking	2024-08-18 05:50:51 +00:00
PoYen, Chen	6b361f5a4b	Clarify the case in warning message	2024-08-18 00:42:22 +00:00
PoYen, Chen	c30d7f9d29	Remove 0 < seqlen_knew constraint	2024-08-16 22:14:05 +00:00
PoYen, Chen	352f6d58b0	Merge branch 'feature/fmha-fwd-appendkv' of github.com:ROCm/composable_kernel into feature/fmha-fwd-appendkv	2024-08-16 22:13:02 +00:00
Dan Yao	79a5d9c10c	[CK_TILE] FA bwd kernels optimization (#1397 ) * tmp save * fix batch deterministic bugs * fix group deterministic bugs * codegen update * reorder files * bias support * hd256 bias support * bwd smoke test update * simplify convert dq * fix hd256 dropout scratch * do{}while() -> while(){} * comments * remove FmhaBwdTilePartitioner * save clear_tile * refactor dropout * code cleanup * code cleanup * comments * fix epilogue problem * fix fwd dropout * group convert_dq opt * fix dq alignment * Do not store storerandval in bwd for flash attention integration * fix hd32 error and boost performance * revert * Remove duplicated WarpGemm definitions in the policy file * dropout patch for mrepeat 1616 code sync up * dq_acc stride * dq_acc stride stuff * codegen update * fwd dropout revert * fix hd128 scratches and boost performance * receipt 3 for simplified smoke test * more strides for fa integration * fix hd64 scratches and boost performance * non-iglp pipeline for headdim padding cases * dpad same as dvpad for flash attention integration * unpadded lse&d for group mode * Support unpad layout for group lse * Support unpad lse layout for splitkv * Fix stride for splitkv kernel * fix unpadded lse issue in fwd splitkv * comment * solve lds read&write conflicts * rename * bias rename * tile index revert --------- Co-authored-by: danyao12 <danyao12> Co-authored-by: rocking <ChunYu.Lai@amd.com> Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>	2024-08-16 13:40:10 -07:00
rocking	34fea2935a	Fix unexisted attribute	2024-08-16 20:30:49 +00:00
PoYen, Chen	d52278a5ef	Add more case for appendkv	2024-08-16 18:23:55 +00:00
PoYen, Chen	d3fd64cd26	Add more appendkv test	2024-08-16 18:03:28 +00:00
PoYen, Chen	51062cae0b	Merge remote-tracking branch 'origin/develop' into feature/fmha-fwd-appendkv	2024-08-16 16:47:06 +00:00
PoYen, Chen	41fdf9b2bc	Fix compilation error	2024-08-16 16:39:11 +00:00
PoYen, Chen	43b8100b7f	Support cache_batch_idx in example	2024-08-16 16:27:56 +00:00
PoYen, Chen	9c904b0e4c	Pass cache_batch_idx to kernels	2024-08-16 15:32:24 +00:00
PoYen, Chen	e6239e14f7	Re-organize bash functions	2024-08-16 12:46:16 +00:00
PoYen, Chen	2523c8e36c	Fix more format	2024-08-16 10:32:17 +00:00
PoYen, Chen	5728c0be65	Fix formatting	2024-08-16 10:25:46 +00:00
PoYen, Chen	095819a387	Remove options	2024-08-16 10:22:44 +00:00
PoYen, Chen	f2b3620511	Use meaningful options in smoke test	2024-08-16 10:18:14 +00:00
PoYen, Chen	aadd3ec63e	Fix wrong syntax in skcheck expr	2024-08-16 10:09:46 +00:00
PoYen, Chen	a4c6029a3d	Fix skcheck logic	2024-08-16 10:08:01 +00:00
PoYen, Chen	5805f5aa73	Remove group mode from appendkv kernel	2024-08-16 10:04:48 +00:00
Haocong WANG	3049b5467c	[GEMM] gemm_universal related optimization (#1453 ) * replace buffer_atomic with global_atomic * fixed global_atomic_add * added bf16 atomic_add * format * clang-format-12 * clean * clean * add guards * Update gtest.cmake * enabled splitk_gemm_multi_d * format * add ckProfiler * format * fixed naming * format * clean * clean * add guards * fix clang format * format * add kbatch printout * clean * Add rocm6.2 related gemm optimization * Limit bf16 atomic usage * remove redundant RCR gemm_universal instance * Add RRR fp8 gemm universal instance * Bug fix * Add GPU_TARGET guard to FP8/BF8 target * bug fix * update cmake * remove all fp8/bf8 example if arch not support * Enable fp8 RRR support in ckProfiler * limit greedy-reverse flag to gemm_universal in ckProfiler --------- Co-authored-by: Jing Zhang <jizhan@fb.com> Co-authored-by: Jing Zhang <jizhan@meta.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com>	2024-08-14 10:42:30 +08:00
Mateusz Ozga	0606e5498e	Support large: 12d tensor size for reduction kenrel (#1465 )	2024-08-13 16:15:47 +02:00
PoYen, Chen	a8a2275aca	Fix wrong arugment count	2024-08-13 08:42:23 +00:00
PoYen, Chen	d96752d0f5	Refine smoke_test_fwd.sh	2024-08-13 08:36:04 +00:00
Illia Silin	cbb6f2ab8c	Disable inapplicable xdl and mha instances for gfx12 (#1464 )	2024-08-12 15:11:58 -07:00
PoYen, Chen	e8603dc21a	Add missing comment	2024-08-08 20:40:50 +00:00
PoYen, Chen	822d5dcd8e	Fix wrong seqlen for kvcache	2024-08-08 20:39:36 +00:00
PoYen, Chen	6a399ea47e	Use generic lambda to init all the api traits/args	2024-08-08 19:22:53 +00:00

1 2 3 4 5 ...

494 Commits