composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-06 07:51:52 +00:00

Author	SHA1	Message	Date
PoYen, Chen	d3fd64cd26	Add more appendkv test	2024-08-16 18:03:28 +00:00
PoYen, Chen	51062cae0b	Merge remote-tracking branch 'origin/develop' into feature/fmha-fwd-appendkv	2024-08-16 16:47:06 +00:00
PoYen, Chen	41fdf9b2bc	Fix compilation error	2024-08-16 16:39:11 +00:00
PoYen, Chen	43b8100b7f	Support cache_batch_idx in example	2024-08-16 16:27:56 +00:00
PoYen, Chen	9c904b0e4c	Pass cache_batch_idx to kernels	2024-08-16 15:32:24 +00:00
PoYen, Chen	e6239e14f7	Re-organize bash functions	2024-08-16 12:46:16 +00:00
PoYen, Chen	2523c8e36c	Fix more format	2024-08-16 10:32:17 +00:00
PoYen, Chen	5728c0be65	Fix formatting	2024-08-16 10:25:46 +00:00
PoYen, Chen	095819a387	Remove options	2024-08-16 10:22:44 +00:00
PoYen, Chen	f2b3620511	Use meaningful options in smoke test	2024-08-16 10:18:14 +00:00
PoYen, Chen	aadd3ec63e	Fix wrong syntax in skcheck expr	2024-08-16 10:09:46 +00:00
PoYen, Chen	a4c6029a3d	Fix skcheck logic	2024-08-16 10:08:01 +00:00
PoYen, Chen	5805f5aa73	Remove group mode from appendkv kernel	2024-08-16 10:04:48 +00:00
Haocong WANG	3049b5467c	[GEMM] gemm_universal related optimization (#1453 ) * replace buffer_atomic with global_atomic * fixed global_atomic_add * added bf16 atomic_add * format * clang-format-12 * clean * clean * add guards * Update gtest.cmake * enabled splitk_gemm_multi_d * format * add ckProfiler * format * fixed naming * format * clean * clean * add guards * fix clang format * format * add kbatch printout * clean * Add rocm6.2 related gemm optimization * Limit bf16 atomic usage * remove redundant RCR gemm_universal instance * Add RRR fp8 gemm universal instance * Bug fix * Add GPU_TARGET guard to FP8/BF8 target * bug fix * update cmake * remove all fp8/bf8 example if arch not support * Enable fp8 RRR support in ckProfiler * limit greedy-reverse flag to gemm_universal in ckProfiler --------- Co-authored-by: Jing Zhang <jizhan@fb.com> Co-authored-by: Jing Zhang <jizhan@meta.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com>	2024-08-14 10:42:30 +08:00
Mateusz Ozga	0606e5498e	Support large: 12d tensor size for reduction kenrel (#1465 )	2024-08-13 16:15:47 +02:00
PoYen, Chen	a8a2275aca	Fix wrong arugment count	2024-08-13 08:42:23 +00:00
PoYen, Chen	d96752d0f5	Refine smoke_test_fwd.sh	2024-08-13 08:36:04 +00:00
Illia Silin	cbb6f2ab8c	Disable inapplicable xdl and mha instances for gfx12 (#1464 )	2024-08-12 15:11:58 -07:00
PoYen, Chen	e8603dc21a	Add missing comment	2024-08-08 20:40:50 +00:00
PoYen, Chen	822d5dcd8e	Fix wrong seqlen for kvcache	2024-08-08 20:39:36 +00:00
PoYen, Chen	6a399ea47e	Use generic lambda to init all the api traits/args	2024-08-08 19:22:53 +00:00
PoYen, Chen	9206808835	Move functors to the begining of validation code	2024-08-08 18:01:10 +00:00
PoYen, Chen	028d89862a	Wrap code by #if directives	2024-08-08 17:58:49 +00:00
PoYen, Chen	9dddf6e437	Rename 'max_num_blocks' to 'max_num_page_blocks'	2024-08-08 17:38:08 +00:00
PoYen, Chen	e3a4bfba88	Show more detailed warning message	2024-08-08 17:35:36 +00:00
PoYen, Chen	d3624a03de	Merge branch 'develop' into feature/fmha-fwd-appendkv	2024-08-08 17:26:53 +00:00
PoYen, Chen	3e2b69e163	Display more info for specific kernels	2024-08-08 17:26:09 +00:00
PoYen, Chen	c8f63d4848	Separate more non-splitkv & splitkv traits/args	2024-08-08 16:54:00 +00:00
PoYen, Chen	677d9b28dd	Use generic lambda to init traits objects	2024-08-08 16:38:17 +00:00
PoYen, Chen	a0d2163045	Remove dropout code in splitkv kernel	2024-08-08 10:21:34 +00:00
PoYen, Chen	9d9c5a6c24	Fix compilation errors	2024-08-08 08:26:55 +00:00
PoYen, Chen	247e135cfc	Remove fmha_fwd_dispatch()	2024-08-08 08:15:04 +00:00
PoYen, Chen	291e9b4bbb	Separate splitkv/non-splitkv args/traits	2024-08-08 08:07:03 +00:00
PoYen, Chen	655b13b059	Rename option s_k_new to s_knew	2024-08-07 15:31:54 +00:00
PoYen, Chen	b6c2f2f01d	Add missing group mode argument	2024-08-07 15:22:57 +00:00
Illia Silin	12c1f68dd9	Run CK_TILE FMHA benchmarks and collect the performance data. (#1447 ) * run ck_tile benchmarks after the smoke tests and store logs * change the path of fmha benchmark logs * change the way of stashig ck_tile fmha logs * prevent the errors in stages where no logs are generated * fix the ck_tile fmha log names and headers * generate the fmha performance logs in the root folder * change jenkins scrip arguments format * use exact file names for stashing * modify scripts to process FMHA performance results * unstash FMHA logs before parsing them	2024-08-07 08:18:26 -07:00
PoYen, Chen	55ce2948a9	Always add fmha_fwd() api	2024-08-07 13:43:14 +00:00
PoYen, Chen	eda78d1a10	Merge branch 'develop' into feature/fmha-fwd-appendkv	2024-08-07 12:17:45 +00:00
PoYen, Chen	838f9955fd	Fix wrong strides for appendkv kernel	2024-08-07 08:06:47 +00:00
PoYen, Chen	443a528adc	Add block_table kernel args for appendkv kernel	2024-08-07 04:27:15 +00:00
PoYen, Chen	15d0034a64	Add paged-kv codegen logic for appendkv kernels	2024-08-07 04:19:45 +00:00
jakpiase	b74d4d4d54	Fix for beta!=0 in reduce (#1440 ) * fix for beta!=0 in reduce * add reviewers suggestions	2024-08-06 09:10:39 -07:00
PoYen, Chen	b98985262d	Add missing kernel arguments for group mode	2024-08-06 14:54:07 +00:00
Bartłomiej Kocot	4ec5c52a0c	Add Grouped Conv Fwd Large Tensor kernel (#1432 ) * Support 64 bit indexing * Add new grouped conv fwd kernel for large tensors * Add instances large tensor * Fixes for transform conv to gemm * Fixes * fixes * Remove not needed instances * examples fixes * Remove not need ds arrays * Fix tests * Add 2GB check in gridwise dl * Fixes	2024-08-06 10:06:10 +02:00
PoYen, Chen	12da00c3be	Use 128 as minimus page_block_size	2024-08-06 03:20:29 +00:00
PoYen, Chen	f9e2bafd10	Make sure we always start reading complete tile	2024-08-06 03:13:57 +00:00
PoYen, Chen	8779716403	Fix uneven split checking logic	2024-08-06 01:17:14 +00:00
PoYen, Chen	3fc7279519	Disable calling fmha_fwd()	2024-08-05 21:36:52 +00:00
PoYen, Chen	8fea4139df	Fix tile window navigation bugs	2024-08-05 21:34:15 +00:00
PoYen, Chen	90d84eaeae	Fix seqlen_k_min for pre-fill case (1 -> 0)	2024-08-04 02:53:40 +00:00

1 2 3 4 5 ...

465 Commits