composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-02 21:27:45 +00:00

Author	SHA1	Message	Date
aska-0096	26d5174e15	update instance and lds layout strategy	2024-11-26 07:29:38 +00:00
aska-0096	ea90b01fc9	fix bug in enable f8 gemm inside ckProfiler	2024-11-20 09:33:39 +00:00
aska-0096	c99e3d595e	Merge branch 'mem_gemm_opt' of https://github.com/ROCm/composable_kernel into update_cka8w8	2024-11-20 05:41:33 +00:00
aska-0096	ec6b000c77	Merge branch 'develop' of https://github.com/ROCm/composable_kernel into update_cka8w8	2024-11-19 08:50:37 +00:00
Bartłomiej Kocot	754adc70e3	Batched GEMM Multiple D based on Universal GEMM (#1655 ) * Batched GEMM Multiple D based on Universal GEMM Co-authored-by: Jing Zhang <jizhan@fb.com> * CI fixes Co-authored-by: Jing Zhang <jizhan@fb.com> --------- Co-authored-by: Jing Zhang <jizhan@fb.com>	2024-11-18 14:03:45 +01:00
aska-0096	f3bbfe3efe	Merge branch 'develop' of https://github.com/ROCm/composable_kernel into update_cka8w8	2024-11-18 07:32:39 +00:00
aska-0096	2b840f5a85	reduce prefetch stage in blockwisepipev4	2024-11-18 07:32:30 +00:00
Illia Silin	efd9261545	fix clang format (#1662 )	2024-11-13 09:20:18 -08:00
Taylor Ding	73f02a1083	Move checks for compatibility from Argument() to IsSupportedArgument() (#1653 )	2024-11-13 11:20:38 -05:00
darren-amd	d0e3a70a2e	Statically Cast Pointer Offset (#1631 ) * explicit cast ptr offset * formating change	2024-11-05 09:59:08 -08:00
aska-0096	f20e48f1f4	Merge branch 'develop' of https://github.com/ROCm/composable_kernel into update_cka8w8	2024-11-05 07:03:42 +00:00
Bartłomiej Kocot	9a8a52130d	Remove virtual destructors from unary ops (#1610 ) * Remove virtual destructors from unary ops * Fixes * Fixes * clang format fixes	2024-10-30 17:42:50 +01:00
aska-0096	b3e5048f12	tempsave	2024-10-30 07:38:59 +00:00
Illia Silin	922e42a039	fix compilation errors for gfx12 with clang20 (#1606 )	2024-10-28 19:02:48 -07:00
Bartłomiej Kocot	31bf253aeb	Add dynamic elementwise op (#1426 ) * Add dynamic elementwise op Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com> * CI issues fix * Custom parameter value for dynamic functions - Comments addressed --------- Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com> Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com>	2024-10-26 15:22:37 +02:00
valarLip	37f7afed1e	add int8 gemm multiply multiply a8w8 (#1591 ) * add int8 gemm multiply multiply a8w8 * uncomment * clang-format-12 * Add example_gemm_multiply_multiply_xdl_int8 * Remove shell scripts * update preprocess number for mi308; bring back printout in ckprofiler * format --------- Co-authored-by: chenjun <junchen2@amd.com> Co-authored-by: Haocong WANG <haocwang@amd.com> Co-authored-by: carlushuang <carlus.huang@amd.com>	2024-10-26 16:39:34 +08:00
chenjun	1670bba95f	clang-format-12	2024-10-21 23:16:04 +08:00
chenjun	7fb0b3223c	add int8 gemm multiply multiply a8w8	2024-10-21 21:57:41 +08:00
Bartłomiej Kocot	f21cda2536	Add transpose scale amax example (#1547 ) * Add transpose scale amax example * fixes * Tune reduce instance	2024-10-14 17:39:38 +02:00
Adam Osewski	29d384d0b2	Implement GetWorkSpaceSize from BaseOperator. (#1564 )	2024-10-12 14:05:11 +08:00
Bartłomiej Kocot	6b54d2faf8	Fix grouped gemm check to avoid overflow (#1545 )	2024-10-04 17:32:43 +02:00
macurtis-amd	aeb7c91f48	Fix compilation errors generated by forthcoming Clang changes (#1544 ) Without this change, the following diagnostic is generated: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] See C++17 spec [temp.names] p5.	2024-10-02 13:56:22 -07:00
Illia Silin	42e6dceacc	Fix compilation errors with Clang20.0. (#1533 ) * fix clang20 compilation errors for gfx90a * fix clang20 compilation errors for gfx11 targets	2024-09-25 13:45:38 -07:00
Bartłomiej Kocot	4ba52b35dc	Add support for NGCHW in grouped conv fwd (#1499 ) * Support NGCHW in grouped conv fwd * Remove not needed variable * Fixes	2024-09-20 10:45:46 +02:00
Mateusz Ozga	448c0f56d8	Pool2d max/avg kernel in the BWD version (#1494 ) * Add pool2d instance BWD AVG * Add pool2d instance BWD MAX * Fix: avg review * Fix review: part2 * Fix - enable test when type is compiled * Fix review part3	2024-09-12 11:47:52 +02:00
jakpiase	e8d2887cb2	Rewrite pool2d fwd (#1462 ) * added pool2d fwd * add tests * add reviewers changes * Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new" This reverts commit `6b2ba7ff89`, reversing changes made to `22c82bea0c`. * Revert "add reviewers changes" This reverts commit `22c82bea0c`. * added reviewers comments * revert some old files * add reviewers requests --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2024-09-11 15:21:00 +02:00
jakpiase	2a261afcdf	Added structural sparsity blockwise gemm (#1435 ) * Implemented smfmac xdlops * Added smfmac blockwise xdlops * fixes * add reviewers suggestions --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2024-09-11 15:19:42 +02:00
Haocong WANG	0b3a409d4f	Merge branch 'develop' of https://github.com/ROCm/composable_kernel into mem_gemm_opt	2024-09-06 03:22:06 +00:00
M.Emin Ozturk	8378855361	Moficiation to fix this issue "threadwise_tensor_slice_transfer_v5r1 issue #1279 " (#1492 ) * issue fix, one line changed for tmp * clang --------- Co-authored-by: Emin Ozturk <emin.ozturk@utah.edu> Co-authored-by: Harisankar Sadasivan <135730918+hsadasiv@users.noreply.github.com>	2024-09-04 21:52:55 -07:00
Haocong WANG	5b10dae6a4	Add gemm universal bf16 instances (#1484 ) * revert ckprofiler change * temp save * Add test and test pass * test pass * Fix bug inside rotating buffer when tensor is not packed * bug fix * clang format --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2024-09-04 20:58:54 -07:00
aska-0096	dbfcb380cd	temp save	2024-09-05 03:04:31 +00:00
aska-0096	cc404d1190	Merge branch 'develop' of https://github.com/ROCm/composable_kernel into mem_gemm_opt	2024-09-04 15:18:52 +00:00
aska-0096	41fcfbc64e	clang format	2024-09-04 15:11:51 +00:00
aska-0096	6df91708a6	temp save	2024-09-04 14:32:02 +00:00
Bartłomiej Kocot	73b67f290f	Add support for NGCHW in grouped conv bwd wei (#1491 ) * Add support for NGCHW in grouped conv bwd wei * Comments fixes * navi fixes * Update function names	2024-09-03 10:52:03 +02:00
aska-0096	4885c38aa4	Merge branch 'transpose_opt' of https://github.com/ROCm/composable_kernel into rowwise_opt	2024-09-03 08:37:45 +00:00
aska-0096	7c8e92face	tempsave	2024-09-03 07:53:04 +00:00
aska-0096	5d9c964ece	temp save	2024-09-02 09:55:07 +00:00
Bartłomiej Kocot	a9b170b541	Revert "Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) (#1415 )" (#1455 )" (#1490 ) This reverts commit `5ff8eeebf9`.	2024-09-02 10:39:49 +02:00
aska-0096	cbf14ee192	tempsave, epilogue optimization for universal gemm done. TODO: mulitpleD epilogue optimization	2024-09-01 08:10:43 +00:00
aska-0096	1ca98e75ce	tempsave	2024-08-26 09:48:07 +00:00
aska-0096	9a99c8416b	temp save	2024-08-26 02:10:18 +00:00
aska-0096	4f65f7b387	tempsave	2024-08-22 07:30:46 +00:00
Andriy Roshchenko	c3515f277c	Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. (#1473 ) * Enable CMakePresets build * Verify Convolution, Scaling and ReLU algorithms. * Add tensor element-wise scale and type cast operation. * Reduction implemented but does not work. * Exploration of Reduction functionality. * Completed example for Convolution scaled with ReLu activation and AMAX reduction. * WIP: Add required instances for convolution. * WIP: Create client example. Implement convolution stage. * Add elementwise instances. * Add elementwise scale + convert example. * Add reduction instances. * WIP: Client example for AMAX reduction. * WIP: Add instances for multistage reduction. * WIP: Implementation of multistage reduction. * Refactoring. * Clean up. * Add CMakePresets.json * Guard off FP8 instances when the data type is not available. * Add example for Scaled FP8 Convolution with AMAX reduction. * Refactor CombConvScaleRelu instances. * Add CombConvScale instances. * Add client example for Scaled FP8 Convolution with AMAX reduction. * Cleanup.	2024-08-21 15:22:41 -07:00
Haocong WANG	3049b5467c	[GEMM] gemm_universal related optimization (#1453 ) * replace buffer_atomic with global_atomic * fixed global_atomic_add * added bf16 atomic_add * format * clang-format-12 * clean * clean * add guards * Update gtest.cmake * enabled splitk_gemm_multi_d * format * add ckProfiler * format * fixed naming * format * clean * clean * add guards * fix clang format * format * add kbatch printout * clean * Add rocm6.2 related gemm optimization * Limit bf16 atomic usage * remove redundant RCR gemm_universal instance * Add RRR fp8 gemm universal instance * Bug fix * Add GPU_TARGET guard to FP8/BF8 target * bug fix * update cmake * remove all fp8/bf8 example if arch not support * Enable fp8 RRR support in ckProfiler * limit greedy-reverse flag to gemm_universal in ckProfiler --------- Co-authored-by: Jing Zhang <jizhan@fb.com> Co-authored-by: Jing Zhang <jizhan@meta.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com>	2024-08-14 10:42:30 +08:00
Mateusz Ozga	0606e5498e	Support large: 12d tensor size for reduction kenrel (#1465 )	2024-08-13 16:15:47 +02:00
Bartłomiej Kocot	4a870942e6	Fix bug with n block id calculation in DeviceGroupedConvXdlCShuffle (#1457 ) * Fix typo in TransformConvFwdToGemm * Fix bug in n offset calculation	2024-08-10 13:12:05 +02:00
Jun Liu	5ff8eeebf9	Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) (#1415 )" (#1455 ) This reverts commit `33b399cc15`.	2024-08-08 19:09:33 -07:00
jakpiase	b74d4d4d54	Fix for beta!=0 in reduce (#1440 ) * fix for beta!=0 in reduce * add reviewers suggestions	2024-08-06 09:10:39 -07:00
Bartłomiej Kocot	4ec5c52a0c	Add Grouped Conv Fwd Large Tensor kernel (#1432 ) * Support 64 bit indexing * Add new grouped conv fwd kernel for large tensors * Add instances large tensor * Fixes for transform conv to gemm * Fixes * fixes * Remove not needed instances * examples fixes * Remove not need ds arrays * Fix tests * Add 2GB check in gridwise dl * Fixes	2024-08-06 10:06:10 +02:00

1 2 3 4 5 ...

445 Commits