composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-07 00:04:37 +00:00

Author	SHA1	Message	Date
rocking	9fa8b4c170	Fix bug of welford when number of m warp > 1	2024-10-29 10:51:33 +00:00
rocking	8beda9d98d	Move reduce2d into reduce folder	2024-10-29 10:02:09 +00:00
rocking	1654e6cd97	Merge branch 'develop' into ck_tile/rmsnorm	2024-10-29 14:42:48 +08:00
Illia Silin	922e42a039	fix compilation errors for gfx12 with clang20 (#1606 )	2024-10-28 19:02:48 -07:00
rocking	356b045fd7	Add README	2024-10-28 19:55:57 +00:00
rocking	6a54faae25	Add save_x to trait	2024-10-28 19:55:44 +00:00
rocking	b683de6b32	Fix bug of x verification	2024-10-28 19:49:08 +00:00
rocking	88d3079065	Add test script	2024-10-28 19:39:37 +00:00
rocking	b83f8d242a	Add instance library	2024-10-28 19:34:51 +00:00
rocking	9a22805e92	Fix bug of kSaveX == false	2024-10-27 11:42:58 +00:00
rocking	0f9969a894	Rename two pass to three pass	2024-10-26 20:29:55 +00:00
rocking	697558d856	Add two pass pipeline	2024-10-26 20:21:18 +00:00
carlushuang	b098b71b05	topk_softmax (#1592 ) * topk_softmax * remove some file * fix atomix linear_offset * address various comment, and change sfc get_index api to static(tuple)	2024-10-26 23:52:49 +08:00
Bartłomiej Kocot	31bf253aeb	Add dynamic elementwise op (#1426 ) * Add dynamic elementwise op Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com> * CI issues fix * Custom parameter value for dynamic functions - Comments addressed --------- Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com> Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com>	2024-10-26 15:22:37 +02:00
Po Yen Chen	54f0e6f4bb	[CK_TILE] More fmha splitkv optimizations (#1588 ) * Use pre-defined constants for readability * Use vector write for o_acc tensor * Remove no-longer used policy method * Deprecate no-longer used policy/pipeline * Specify gemm0/gemm1 block warps separately in codegen * Fix wrong ps_idx creation logic * Add single-warp block gemm * Supoprt single-warp gemm0 * Make MakeCBlockTile() as static method * Use MakeCBlockTile() to get underlying tile distribution * Use kNumGemm1Warps to compute # threads for gemm1 * Put normal case in the if clause * Refine fmha splitkv block mapping * Refine & fix the lse_acc/o_acc layout * Fix wrong LDS size for K tile * Use kK0=64 for hdim=128,256 fmha splitkv kernels * Use kK1=64 for hdim=32,64,128 fmha splitkv kernels * Undo kK0/kK1 changes * Use more reasonable GetAlignmentV() computation * Using store_tile() in fmha splitkv kernel epilogue	2024-10-26 18:35:45 +08:00
rocking	2d4480a123	Refine tile size	2024-10-26 10:23:20 +00:00
rocking	1c1f1e35b5	Fix bug of one pass pipeline	2024-10-26 10:22:50 +00:00
rocking	27d96b4031	host verification	2024-10-26 10:22:09 +00:00
valarLip	37f7afed1e	add int8 gemm multiply multiply a8w8 (#1591 ) * add int8 gemm multiply multiply a8w8 * uncomment * clang-format-12 * Add example_gemm_multiply_multiply_xdl_int8 * Remove shell scripts * update preprocess number for mi308; bring back printout in ckprofiler * format --------- Co-authored-by: chenjun <junchen2@amd.com> Co-authored-by: Haocong WANG <haocwang@amd.com> Co-authored-by: carlushuang <carlus.huang@amd.com>	2024-10-26 16:39:34 +08:00
rocking	826ee18a11	Add reduce op	2024-10-25 22:51:15 +00:00
rocking	1e0c9fde51	Add add_rmsnorm2d_rdquant kernel	2024-10-25 20:50:48 +00:00
Max Podkorytov	eda5938386	add parsing grouped conv fwd instances	2024-10-25 08:25:53 -07:00
Rostyslav Geyyer	7d576f1748	Update GPU verification (#1596 ) * Update inits * Update static_cast to type_convert * Add verification option selection	2024-10-25 08:13:46 -07:00
aledudek	9385caa306	Generic threshold calculation (#1546 ) * Calculate generic relative threshold pool3dfwd * Calculate absolute error threshold pool3d fwd * Generic threshold calculation take max input for relative error pool3dfwd * Remove max possible value for error calculation at runtime * Remove debug print in pool3dfwd * Pool3d fwd adjusted types in generic threshold calculation * Generic threshold calculation take into account number of accumulations and accdatatype * Generic threshold fix final error formula * Generic threshold calculation - num of accs fix * Generic threshold calculation - adjust absolute error * Generic threshold calculation - OutDataType in absolute error	2024-10-25 12:46:24 +02:00
dummycoderfe	9183ce69ca	hot_fix epsilon pos (#1597 ) Co-authored-by: dummycoderfe <noplydummmycoder@163.com>	2024-10-25 11:17:45 +08:00
rocking	871af334d1	Refine pipeline name	2024-10-24 20:42:40 +00:00
rocking	c89d8ca95f	clang format	2024-10-24 17:05:36 +00:00
rocking	1684d71a3f	Fix cmake	2024-10-24 11:44:55 +00:00
rocking	1e6814a6bd	Refine naming	2024-10-24 11:44:40 +00:00
rocking	d79715ba53	Fix bug of rmsnorm	2024-10-24 11:43:45 +00:00
rocking	e4a169dd47	refine example of rmsnorm	2024-10-24 11:43:15 +00:00
rocking	a50ec83d03	refine naming	2024-10-24 08:48:34 +00:00
rocking	df976ff6a1	Add missing cmake change	2024-10-24 06:13:03 +00:00
rocking	3d2e3be652	Add script to test performance and correctness	2024-10-24 06:12:42 +00:00
rocking	5b3108a62f	Remove static assert to prevent compile fail	2024-10-24 06:09:23 +00:00
Illia Silin	8e22e1ae31	fix the logic of enabling XDL and WMMA instances (#1595 )	2024-10-23 15:55:39 -07:00
rocking	a5986c70dc	Add rmsnorm small example	2024-10-23 19:31:05 +00:00
rocking	382a2af212	Add rmsnorm2d	2024-10-23 19:23:51 +00:00
Bartłomiej Kocot	cedccd59c9	[POST MERGE PR] Enable grouped conv bwd wei bf16 NGCHW (#1594 )	2024-10-23 12:02:33 +02:00
rocking	dfb4bf9488	Fix bug of std caculation	2024-10-22 20:36:25 +00:00
rocking	26f16dd20b	Prevent user use cross warp reduction	2024-10-22 19:29:46 +00:00
Jatin Chaudhary	4d5248e2d1	Explicit cast values to half (#1593 ) Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2024-10-22 11:17:32 -07:00
rocking	9e7fcc0b37	Add reduce2d new api	2024-10-22 14:52:10 +00:00
Bartłomiej Kocot	82fc53835a	Enable grouped conv bwd wei bf16 NGCHW (#1589 ) * Enable grouped conv bwd wei bf16 NGCHW * fixes * fixes * Fixes * fixes * fixes * Fixes	2024-10-22 16:18:28 +02:00
ltqin	0394f8a713	update layernorm (#1570 ) * port layernorm * change warp_welford.hpp * Update warpshuffle * 1. Add save mean and save std back 2. Move construction of tensor_view and tile_window to operator() * refine welford max count calculation * unify layernorm api * Rename file * Remove save mean and inv std * Revert "refine welford max count calculation" This reverts commit `022365802b`. * Fix order of parameter * refine welford max count calculation again * Remove fp32 instances * Fix bug of padding * refactor api * Support bf16 * Extract common function * Refine arg of operator() * Add kMThreadPerBlock to template parameter * clang format * Refine variable name * Refine file name * remove redundant line * refactor layernorm2d pipeline and add block-per-block utility * fix name * rename more * add more block-per-tile instance * remove duplicated define * update instance for 2048, 1024 case * support up to 2048 now * opt loading * add n1536 * Add two pass pipeline * format * Fix incorrect type * parallel compilation * Use smaller N * fix 2p pass * Support Repeat_M in distribution * Refine nameing * Add reduce example --------- Co-authored-by: letaoqin <letaoqin@amd.com> Co-authored-by: aska-0096 <haocwang@amd.com> Co-authored-by: rocking <ChunYu.Lai@amd.com> Co-authored-by: carlushuang <carlus.huang@amd.com>	2024-10-22 09:26:18 +08:00
Rostyslav Geyyer	3f710930f6	Update default stride (#1576 ) * Update default stride value to -1 * Fix format * Revert "Fix format" This reverts commit `ae0c3649ec`. --------- Co-authored-by: Harisankar Sadasivan <135730918+hsadasiv@users.noreply.github.com>	2024-10-21 08:45:22 -07:00
spolifroni-amd	794f2d64a8	added link to documentation (#1578 )	2024-10-21 08:35:57 -07:00
dependabot[bot]	d0565e33d6	Bump rocm-docs-core from 1.8.2 to 1.8.3 in /docs/sphinx (#1587 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.2 to 1.8.3. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.2...v1.8.3) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-10-21 08:34:53 -07:00
Thomas Ning	560917b161	Ck profiler instance support (#1575 ) * The draft on ckProfiler instance add * support the ck profiler instance with same data types * add a small feature on the M and N variable switch. * Partially solve the incorrect result problem * fix based on ci cd	2024-10-21 22:47:48 +08:00
Po Yen Chen	95e722a3b3	[CK_TILE] Optimize fmha splitkv & splitkv combine kernels (#1577 ) * Use smaller width for lse_accum dist tensor * Update pipeline comment * Fix wrong distribution for lse_accum * Remove duplicate dim in lse_accum dist encoding * Decide fmha splitkv combine kernel kBlockSize by kM0 * Remove assumption of MPerThread=1 * Add log<4> & log<8> specialization * Enlarge occupancy array * Fix vector size for small tile * Add support for kMaxSplits=8 * Re-format gemm.hpp * Use 16x16x16 warp gemm for fwd_splitkv * Centralize policy code changes * Leave fp8/bf8 tile settings unchanged	2024-10-21 10:52:11 +08:00

1 2 3 4 5 ...

1516 Commits