composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-12 01:10:17 +00:00

Author	SHA1	Message	Date
dependabot[bot]	ba82beb9bf	Bump rocm-docs-core from 1.2.1 to 1.3.0 in /docs/sphinx (#1324 ) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.2.1 to 1.3.0. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.2.1...v1.3.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-06-05 07:36:39 -07:00
Rostyslav Geyyer	cb0645bedc	Add a scale op, related instances and examples (#1242 ) * Add a scale op * Update the element op * Add instances * Add an example * Add a client example * Add a flag check * Revert flag check addition * Fix flag check * Update d strides in example * Update d strides in client example * Apply suggestions from code review Update copyright header Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Move the example * Move the client example * Update element op * Update example with the new element op * Add scalar layout * Update example * Update kernel for scalar Ds * Revert kernel changes * Update element op * Update example to use scales' pointers * Format * Update instances * Update client example * Move element op to unary elements * Update element op to work with values instead of pointers * Update instances to take element op as an argument * Update examples to use random scale values --------- Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>	2024-06-04 19:28:15 -05:00
Dan Yao	2cab8d39e3	CK Tile FA Training kernels (#1286 ) * FA fwd dropout * FA bwd * epilogue reuse * CMakeLists update * [CK_TILE] support alibi (#1269) * add alibi support * fix code * update code based on comment * Support more hdim * fix fp8 bias * support seqlen_k=0 case * remove unused printf * fix format --------- Co-authored-by: rocking <ChunYu.Lai@amd.com> * now fwd/bwd can build * bwd alibi * add bwd validation stream_config * update generated filenames * update bwd kernel launch * CK_TILE_HOST_DEVICE in philox * Transpose -> transpose * format * format * format * Generate the instance for FA required * format * fix error in WarpGemm --------- Co-authored-by: danyao12 <danyao12> Co-authored-by: carlushuang <carlus.huang@amd.com> Co-authored-by: rocking <ChunYu.Lai@amd.com> Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> Co-authored-by: Jing Zhang <jizhan@amd.com>	2024-06-04 13:12:45 -05:00
dependabot[bot]	76827d82ca	Bump rocm-docs-core from 1.2.0 to 1.2.1 in /docs/sphinx (#1322 ) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.2.0 to 1.2.1. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.2.0...v1.2.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-06-03 22:41:56 -07:00
Illia Silin	3fa7e2a6c4	disable the hipTensor test by default, only run once daily (#1321 )	2024-06-03 14:07:30 -07:00
zjing14	6fb1f4e03f	Post-merge fix of PR 1300 (#1313 ) * add f8 gemm with multiD for both row/col wise * change compute_type to fp8 * changed tuning parameters in the example * add rcr example * post-merge fix * fix * reduce init range	2024-05-31 22:46:41 -07:00
Illia Silin	34f3dfdd61	Build CK library for all supported targets. (#1312 ) * test library build for all supported targets * increase the number of threads to build lib in CI to 64	2024-05-28 12:36:06 -07:00
dependabot[bot]	66de8a02ba	Bump rocm-docs-core from 1.1.3 to 1.2.0 in /docs/sphinx (#1311 ) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.1.3 to 1.2.0. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.3...v1.2.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-05-28 11:36:09 -07:00
zjing14	80db62f08d	add f8 gemm multiD with both row/col wise scale (#1300 ) * add f8 gemm with multiD for both row/col wise * change compute_type to fp8 * changed tuning parameters in the example * add rcr example	2024-05-28 12:04:22 -05:00
carlushuang	5055b3bdcb	[CK_TILE] support group from cmdline (#1295 ) * support cmdline seqlen decode * silent print * update readme * update kernel launch 3d * update tile partitioner * fix spill for bf16 * modify based on comment * modify payload_t * fix bug for alibi mode * fix alibi test err * refactor kernel launch, support select timer * add missing file * remove useless code * add some comments	2024-05-28 11:13:21 +08:00
Joseph Macaranas	02fa2c298b	Enable external CI pipeline triggers (#1310 )	2024-05-23 18:21:34 -04:00
Illia Silin	ec2bae27ff	Split the gemm_multi_abd instances. (#1306 ) * split the gemm_multi_abd instances * update the dates	2024-05-23 09:17:02 -07:00
dependabot[bot]	06a9b72caf	Bump rocm-docs-core from 1.1.2 to 1.1.3 in /docs/sphinx (#1308 ) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.1.2 to 1.1.3. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.2...v1.1.3) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-05-23 07:45:53 -07:00
Max Podkorytov	29e58d5b28	Make the library which generates CK instances for pytorch2 inductor's CK backend usage Also bundle the CK library and include files with the pip package. The package is pip-installable with `pip install git+https://github.com/tenpercent/composable_kernel@enable-pip` (substitute the repo path and branch if necessary) Testing: `myenv/bin/python3 -m ck4inductor.universal_gemm.gen_instances` (prints a list of instances) `tree myenv/lib/python3.12/site-packages/ck4inductor` (observe the list of sources along the installed package)	2024-05-22 13:44:22 -07:00
Bartłomiej Kocot	fd72380aeb	Optimize grouped conv bwd weight for small M and N (#1303 ) * Optimize grouped conv bwd weight for small M and N * Fixes	2024-05-22 21:01:01 +02:00
Illia Silin	7b027d5643	Select appropriate GPU targets for instances, tests, and examples. (#1304 ) * set individual gpu targets for instances, examples, tests * fix path to hip compiler * fix path to hip compiler once more * aggregate device macros in ck_tile config header * fix the cmake logic for instances * fix clang format * add gfx900 and gfx906 to default set of targets	2024-05-22 11:45:27 -07:00
Rostyslav Geyyer	204da9c522	Move grouped conv fwd client examples (#1299 ) * Move grouped conv fwd client examples * Update existing examples * Format	2024-05-21 09:52:41 -05:00
Illia Silin	06b891c5c2	aggregate device macros in ck_tile config header (#1297 )	2024-05-20 08:34:45 -07:00
Illia Silin	1274861a9d	replace the ENV macro with CK_ENV (#1296 )	2024-05-17 10:42:51 -07:00
dependabot[bot]	6637a810d0	Bump rocm-docs-core from 1.1.1 to 1.1.2 in /docs/sphinx (#1293 ) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.1.1 to 1.1.2. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.1...v1.1.2) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-05-17 07:44:48 -07:00
rocking	aaa8dfdae9	Fix compile error (#1292 ) error: no viable conversion from returned value of type '__half' to function return type 'fp16_hip_t' (aka '_Float16') Co-authored-by: carlushuang <carlus.huang@amd.com>	2024-05-17 17:19:17 +08:00
Illia Silin	c44137838e	remove wrong use of nonexistent class members (#1290 )	2024-05-15 08:08:17 -07:00
carlushuang	dd0dd13d4e	remove operator-deref (#1291 )	2024-05-15 08:06:50 -07:00
jakpiase	3e3471d5d2	Add unit tests for grouped gemm two stage (#1256 ) * add unit tests for grouped gemm two stage * add reviewers suggestions --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2024-05-15 10:03:39 +02:00
Illia Silin	7843a8a7fb	re-enable convnd_fwd_xdl_fp64 testing (#1289 )	2024-05-10 22:48:28 -07:00
Illia Silin	566b6480a2	Code clean-up (#1285 ) * code clean-up * remove the profiling output samples	2024-05-10 09:41:39 -07:00
carlushuang	fcba889ef4	[CK_TILE] fix some rand number init (#1287 ) * add random norm * normalized default to 0/3 * change squant->auto	2024-05-10 09:03:39 -07:00
Bartłomiej Kocot	8346af9c68	Change output gemm type to AccDataType in two stage conv bwd wei (#1283 )	2024-05-10 10:57:42 +02:00
Adam Osewski	a0ae1c6133	Fix MakeArgument (#1284 )	2024-05-09 09:42:41 -07:00
Adam Osewski	3c043cd10b	Add vector instruction coherency bits for gfx94 targets. (#1268 )	2024-05-09 07:30:17 -07:00
Illia Silin	fdbf8ccbd7	fix the output formatting (#1282 )	2024-05-08 16:11:54 -07:00
Bartłomiej Kocot	0b6b5d1785	Add two stage grouped conv bwd weight kernel (#1280 )	2024-05-08 09:53:24 +02:00
Illia Silin	bf42097646	Enable logging in CK with environment variable. (#1278 ) * enable logging using environment variable * update ck.hpp header * fix typo * fix clang format * Update include/ck/utility/env.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> --------- Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>	2024-05-07 16:26:43 -07:00
carlushuang	851c3ed157	[CK_TILE] support alibi (#1269 ) * add alibi support * fix code * update code based on comment * Support more hdim * fix fp8 bias * support seqlen_k=0 case * remove unused printf * fix format --------- Co-authored-by: rocking <ChunYu.Lai@amd.com>	2024-05-07 22:32:54 +08:00
Sam Wu	6d073d31bb	Add ROCm Doc team as codeowners for RTD yaml (#1277 ) Also add component owners as codeowners for header directory	2024-05-06 10:07:39 -06:00
Illia Silin	08d51d9bc4	add missing vector header (#1275 )	2024-05-02 11:27:59 -07:00
Illia Silin	7797f7c7a1	Downgrade minimum required python version to 3.6 (#1274 )	2024-05-01 15:34:56 -07:00
Illia Silin	f0bf1e3125	[CI] Focus CI stages on MI200 nodes for resource optimization (#1273 )	2024-05-01 10:07:14 -07:00
Rostyslav Geyyer	a2d0bdd5a9	Add an ignore (#1270 )	2024-04-30 20:45:22 -07:00
Sam Wu	43579900a9	Update documentation requirements and configurations (#1272 ) * Update documentation requirements Set rocm-docs-core to v1.1.1 * Update RTD config Set Python 3.10 for rocm-docs-core >= v1.0.0	2024-04-30 20:44:59 -07:00
Illia Silin	f6b3f4715d	[CI][Tests] Add a daily cron job to build CK instances for gfx9;gfx10;gfx11. (#1271 ) * add a daily build for instances for gfx9;gfx10;gfx11 * fix jenkins logic for instances only build * fix the path for instance_only build * reduce the number of build threads to 32	2024-04-30 14:44:30 -07:00
Adam Osewski	0f7e8ec485	Fix example CMakeLists.txt (#1267 ) Add proper dependency target.	2024-04-30 08:28:19 -07:00
Rostyslav Geyyer	6ced3c12ff	Mark unneeded instances as "getting deprecated" (#1265 ) * Add a flag * Add flag check and messages --------- Co-authored-by: root <root@aus-g7-rogeyyer.amd.com>	2024-04-29 12:00:55 -07:00
Haocong WANG	764164b488	[GEMM] UniversalGemm update (#1262 ) * Add bf16 instances * Add bf16 gemm universal example * tempsave * Add guard to navi compilation * workground on a specific mixed gemm instance ( bring back it when compiler fix upload) * fix formatting condition statement issue * solve conflict --------- Co-authored-by: Jun Liu <Liu.Jun@amd.com>	2024-04-26 12:56:07 -05:00
Rostyslav Geyyer	f044ff71fb	Add element op (#1259 )	2024-04-26 12:55:45 -05:00
zjing14	5ae893c0d3	ggemm tile_loop multD bf16 int8 (#1258 ) * Overload output stream operator for LoopScheduler and PiplineVersion * Add Run overload accepting grid descriptors MK. * Add __device__ keyword for CalculateGridSize * Create device op GroupedGemmMultipleD * Add GroupedGemm MultipleD Tile Loop implementation. * Add an example for GroupedGemm MultipleD tile loop. * Device Op GroupedGEMMTileLoop. * Bunch of small changes in exmaple. * CkProfiler * Remove unused tparam. * changed the copy function to v7r2 * adding multi_abd * in-progress * add post-load oob check * Fix include statement. * Fix output stream overloads. * Do not make descriptors and check validity untill we find group. * Fix gemm desc initialization. * debugging * adjust instances * add run_lds * add elemntwise_op * replace multi_abd_device with v3 * clean up * clean * clean * Revert device op * Fix compilation for DTYPES=FP16 * Validate tensor transfers paramters. * Added LDSType * profiling * adjust oobcheck * add missing file * Validate on host only NK dims if M is not known. * add * clean * refactor * clean * add examples * add fuse * add fusion and client example * Fix bug. * A convenient debug func for selecting threads. * Fix has main k block loop bug. * Make sure that b2c has up to date tile offset. * Output stream operator for Sequence type. * Cmake file formatting. * clean --------- Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>	2024-04-26 10:37:49 -05:00
zjing14	0d0150db20	bf16A_Int8B with fastgelu/bias (#1264 ) * changed the copy function to v7r2 * adding multi_abd * in-progress * add post-load oob check * debugging * adjust instances * add run_lds * add elemntwise_op * replace multi_abd_device with v3 * clean up * clean * clean * Added LDSType * profiling * adjust oobcheck * add missing file * refactor * clean * add examples	2024-04-26 07:26:30 -05:00
Adam Osewski	b4032629e5	Grouped GEMM Multiple D tile loop. (#1247 ) * Overload output stream operator for LoopScheduler and PiplineVersion * Add Run overload accepting grid descriptors MK. * Add __device__ keyword for CalculateGridSize * Create device op GroupedGemmMultipleD * Add GroupedGemm MultipleD Tile Loop implementation. * Add an example for GroupedGemm MultipleD tile loop. * Device Op GroupedGEMMTileLoop. * Bunch of small changes in exmaple. * CkProfiler * Remove unused tparam. * Fix include statement. * Fix output stream overloads. * Do not make descriptors and check validity untill we find group. * Fix gemm desc initialization. * Revert device op * Fix compilation for DTYPES=FP16 * Validate tensor transfers paramters. * Validate on host only NK dims if M is not known. * Fix bug. * A convenient debug func for selecting threads. * Fix has main k block loop bug. * Make sure that b2c has up to date tile offset. * Output stream operator for Sequence type. * Cmake file formatting.	2024-04-25 15:12:53 -05:00
ltqin	f448d179b7	Universal gemm flush cache (#1251 ) * add flush cache to device op * add flush cache parameter to ckProfiler * change calculate size a and b method * chang evaluation time method foro AVERAGE to MEDIAN * format code * adjust some code * fix core dumped * remove loop call flush icache in kernel * remove loop(outer) call flush icache --------- Co-authored-by: letaoqin <letaoqin@amd.com>	2024-04-25 15:07:14 -05:00
Bartłomiej Kocot	b1f8ae379b	Fix contraction IsSupported checks (#1257 )	2024-04-23 22:59:39 +02:00

1 2 3 4 5 ...

1279 Commits