composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-07 00:04:37 +00:00

Author	SHA1	Message	Date
PoYen, Chen	c40c1daff0	Extract common logics	2024-06-26 18:02:28 +00:00
PoYen, Chen	8fb567c286	Fix vnew append errro	2024-06-26 17:00:07 +00:00
PoYen, Chen	4e6c28522c	Fix wrong K values after appending	2024-06-25 10:12:13 +00:00
PoYen, Chen	1ac17dae50	Add knew/vnew tensors to the kernel argument	2024-06-25 07:56:36 +00:00
PoYen, Chen	344902732a	Sync kernel name with the codegen	2024-06-24 14:50:25 +00:00
PoYen, Chen	342c8cf01d	Call HIP_CHECK_ERROR() macro to get real source info	2024-06-24 14:33:09 +00:00
PoYen, Chen	bace0e5df0	Add init codegen logic for fmha fwd appendkv	2024-06-24 12:33:51 +00:00
carlushuang	fa129c1a5d	WA for rocm-6.2+ s constrait for buffer resource (#1346 ) * WA for rocm-6.2+ s constrait for buffer resource * add missing memory clobber	2024-06-21 11:00:13 -05:00
Bartłomiej Kocot	510325a468	Fix cmake warnings (#1342 ) * Cmake add -Wno-nvcc-compt * Remove template without initialization list * dpp remove template without init list * Fixes	2024-06-21 09:47:58 +02:00
Dan Yao	1da802bdf2	Fix FA bwd alibi+causal NaN errors (#1352 ) * fix bwd alibi nan error * fix datatype --------- Co-authored-by: danyao12 <danyao12>	2024-06-20 09:50:53 -05:00
ThruptiRajLakshmanaGowda	0162a5f6ba	Adding Missed Activation Functions for Grouped 2D/3D Convolutions (#1348 ) * Initial Push * First Push * Fixed Clang format * Resolve merge conflict * Addressed review comments * Addressed review comments * Addressed review comments	2024-06-20 09:24:54 -05:00
Qianfeng	e3f44659cf	Fix in dropout lambda to avoid the compiling issue on some docker/compiler envs (#1350 )	2024-06-20 11:36:42 +08:00
Qianfeng	1973903f49	Hacking ck_tile fmha Dropout facility (#1344 ) * Add NullBlockDropout to be used when kHasDropout is false * Change to BlockDropout::Run() for forward to reduce conditional checkings * Re-format files --------- Co-authored-by: PoYen, Chen <PoYen.Chen@amd.com>	2024-06-19 10:37:22 +08:00
Bartłomiej Kocot	8faec23cb4	Add read_first_lane function for int64 (#1347 )	2024-06-18 15:05:30 -05:00
jakpiase	e2d139201b	Switch to universal gemm in grouped gemm tile loop (#1335 ) * switch to universal gemm in grouped gemm tile loop * minor fixes * add reviewers comments --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2024-06-18 09:01:49 -05:00
Bartłomiej Kocot	933951ed48	Fix continous dim selection in contraction (#1336 ) * Fix continous dim selection in contraction * Fixes	2024-06-18 10:26:49 +02:00
carlushuang	17ed368f58	[CK_TILE][FA] using pk f16_f32 (#1343 ) * [CK_TILE][FA] using pk f16_f32 * correct a error	2024-06-17 17:16:46 +08:00
zjing14	e02103168a	disabled lds direct load inline asm (#1331 )	2024-06-16 20:33:47 -05:00
Bartłomiej Kocot	dc1e9c5df9	Support large tensors in grouped conv fwd (#1332 ) * Support large tensors in grouped conv fwd * Multi ABD fixes * Fix calculate element space size	2024-06-14 09:53:03 -05:00
Qianfeng	37a347e380	Fix to the using of static_for in amd_buffer_addressing.hpp (#1337 ) * Add insert_dummy_dep_per_dword over-loading for length 64 * Fix insert_dummy_dep_per_dword and remove over-loading for length 64 * Remove blank lines --------- Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>	2024-06-13 16:12:20 +08:00
Rostyslav Geyyer	ce66277a76	Add a convinvscale op, related instances and examples (#1307 ) * Update the element op * Add an example * Add instances * Add a client example * make sure new instances only build on gfx9 * Update element op and its handling * Format * Update instances to take element op as an argument * Update examples to use random scale values * Format * Update client example with random scales * Format --------- Co-authored-by: illsilin <Illia.Silin@amd.com>	2024-06-10 14:48:49 -05:00
Bartłomiej Kocot	ac58cc5d1d	Integrate universal gemm with conv forward (#1320 ) * Integrate universal gemm with conv fwd * Fix conv fwd wmma test * Fix instances * Remove direct load check	2024-06-05 13:01:29 -05:00
Rostyslav Geyyer	cb0645bedc	Add a scale op, related instances and examples (#1242 ) * Add a scale op * Update the element op * Add instances * Add an example * Add a client example * Add a flag check * Revert flag check addition * Fix flag check * Update d strides in example * Update d strides in client example * Apply suggestions from code review Update copyright header Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Move the example * Move the client example * Update element op * Update example with the new element op * Add scalar layout * Update example * Update kernel for scalar Ds * Revert kernel changes * Update element op * Update example to use scales' pointers * Format * Update instances * Update client example * Move element op to unary elements * Update element op to work with values instead of pointers * Update instances to take element op as an argument * Update examples to use random scale values --------- Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>	2024-06-04 19:28:15 -05:00
Dan Yao	2cab8d39e3	CK Tile FA Training kernels (#1286 ) * FA fwd dropout * FA bwd * epilogue reuse * CMakeLists update * [CK_TILE] support alibi (#1269) * add alibi support * fix code * update code based on comment * Support more hdim * fix fp8 bias * support seqlen_k=0 case * remove unused printf * fix format --------- Co-authored-by: rocking <ChunYu.Lai@amd.com> * now fwd/bwd can build * bwd alibi * add bwd validation stream_config * update generated filenames * update bwd kernel launch * CK_TILE_HOST_DEVICE in philox * Transpose -> transpose * format * format * format * Generate the instance for FA required * format * fix error in WarpGemm --------- Co-authored-by: danyao12 <danyao12> Co-authored-by: carlushuang <carlus.huang@amd.com> Co-authored-by: rocking <ChunYu.Lai@amd.com> Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> Co-authored-by: Jing Zhang <jizhan@amd.com>	2024-06-04 13:12:45 -05:00
zjing14	6fb1f4e03f	Post-merge fix of PR 1300 (#1313 ) * add f8 gemm with multiD for both row/col wise * change compute_type to fp8 * changed tuning parameters in the example * add rcr example * post-merge fix * fix * reduce init range	2024-05-31 22:46:41 -07:00
zjing14	80db62f08d	add f8 gemm multiD with both row/col wise scale (#1300 ) * add f8 gemm with multiD for both row/col wise * change compute_type to fp8 * changed tuning parameters in the example * add rcr example	2024-05-28 12:04:22 -05:00
carlushuang	5055b3bdcb	[CK_TILE] support group from cmdline (#1295 ) * support cmdline seqlen decode * silent print * update readme * update kernel launch 3d * update tile partitioner * fix spill for bf16 * modify based on comment * modify payload_t * fix bug for alibi mode * fix alibi test err * refactor kernel launch, support select timer * add missing file * remove useless code * add some comments	2024-05-28 11:13:21 +08:00
Bartłomiej Kocot	fd72380aeb	Optimize grouped conv bwd weight for small M and N (#1303 ) * Optimize grouped conv bwd weight for small M and N * Fixes	2024-05-22 21:01:01 +02:00
Illia Silin	06b891c5c2	aggregate device macros in ck_tile config header (#1297 )	2024-05-20 08:34:45 -07:00
Illia Silin	1274861a9d	replace the ENV macro with CK_ENV (#1296 )	2024-05-17 10:42:51 -07:00
rocking	aaa8dfdae9	Fix compile error (#1292 ) error: no viable conversion from returned value of type '__half' to function return type 'fp16_hip_t' (aka '_Float16') Co-authored-by: carlushuang <carlus.huang@amd.com>	2024-05-17 17:19:17 +08:00
Illia Silin	c44137838e	remove wrong use of nonexistent class members (#1290 )	2024-05-15 08:08:17 -07:00
carlushuang	dd0dd13d4e	remove operator-deref (#1291 )	2024-05-15 08:06:50 -07:00
jakpiase	3e3471d5d2	Add unit tests for grouped gemm two stage (#1256 ) * add unit tests for grouped gemm two stage * add reviewers suggestions --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2024-05-15 10:03:39 +02:00
Illia Silin	566b6480a2	Code clean-up (#1285 ) * code clean-up * remove the profiling output samples	2024-05-10 09:41:39 -07:00
Bartłomiej Kocot	8346af9c68	Change output gemm type to AccDataType in two stage conv bwd wei (#1283 )	2024-05-10 10:57:42 +02:00
Adam Osewski	a0ae1c6133	Fix MakeArgument (#1284 )	2024-05-09 09:42:41 -07:00
Adam Osewski	3c043cd10b	Add vector instruction coherency bits for gfx94 targets. (#1268 )	2024-05-09 07:30:17 -07:00
Illia Silin	fdbf8ccbd7	fix the output formatting (#1282 )	2024-05-08 16:11:54 -07:00
Bartłomiej Kocot	0b6b5d1785	Add two stage grouped conv bwd weight kernel (#1280 )	2024-05-08 09:53:24 +02:00
Illia Silin	bf42097646	Enable logging in CK with environment variable. (#1278 ) * enable logging using environment variable * update ck.hpp header * fix typo * fix clang format * Update include/ck/utility/env.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> --------- Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>	2024-05-07 16:26:43 -07:00
carlushuang	851c3ed157	[CK_TILE] support alibi (#1269 ) * add alibi support * fix code * update code based on comment * Support more hdim * fix fp8 bias * support seqlen_k=0 case * remove unused printf * fix format --------- Co-authored-by: rocking <ChunYu.Lai@amd.com>	2024-05-07 22:32:54 +08:00
Illia Silin	08d51d9bc4	add missing vector header (#1275 )	2024-05-02 11:27:59 -07:00
Rostyslav Geyyer	6ced3c12ff	Mark unneeded instances as "getting deprecated" (#1265 ) * Add a flag * Add flag check and messages --------- Co-authored-by: root <root@aus-g7-rogeyyer.amd.com>	2024-04-29 12:00:55 -07:00
Haocong WANG	764164b488	[GEMM] UniversalGemm update (#1262 ) * Add bf16 instances * Add bf16 gemm universal example * tempsave * Add guard to navi compilation * workground on a specific mixed gemm instance ( bring back it when compiler fix upload) * fix formatting condition statement issue * solve conflict --------- Co-authored-by: Jun Liu <Liu.Jun@amd.com>	2024-04-26 12:56:07 -05:00
Rostyslav Geyyer	f044ff71fb	Add element op (#1259 )	2024-04-26 12:55:45 -05:00
zjing14	0d0150db20	bf16A_Int8B with fastgelu/bias (#1264 ) * changed the copy function to v7r2 * adding multi_abd * in-progress * add post-load oob check * debugging * adjust instances * add run_lds * add elemntwise_op * replace multi_abd_device with v3 * clean up * clean * clean * Added LDSType * profiling * adjust oobcheck * add missing file * refactor * clean * add examples	2024-04-26 07:26:30 -05:00
Adam Osewski	b4032629e5	Grouped GEMM Multiple D tile loop. (#1247 ) * Overload output stream operator for LoopScheduler and PiplineVersion * Add Run overload accepting grid descriptors MK. * Add __device__ keyword for CalculateGridSize * Create device op GroupedGemmMultipleD * Add GroupedGemm MultipleD Tile Loop implementation. * Add an example for GroupedGemm MultipleD tile loop. * Device Op GroupedGEMMTileLoop. * Bunch of small changes in exmaple. * CkProfiler * Remove unused tparam. * Fix include statement. * Fix output stream overloads. * Do not make descriptors and check validity untill we find group. * Fix gemm desc initialization. * Revert device op * Fix compilation for DTYPES=FP16 * Validate tensor transfers paramters. * Validate on host only NK dims if M is not known. * Fix bug. * A convenient debug func for selecting threads. * Fix has main k block loop bug. * Make sure that b2c has up to date tile offset. * Output stream operator for Sequence type. * Cmake file formatting.	2024-04-25 15:12:53 -05:00
ltqin	f448d179b7	Universal gemm flush cache (#1251 ) * add flush cache to device op * add flush cache parameter to ckProfiler * change calculate size a and b method * chang evaluation time method foro AVERAGE to MEDIAN * format code * adjust some code * fix core dumped * remove loop call flush icache in kernel * remove loop(outer) call flush icache --------- Co-authored-by: letaoqin <letaoqin@amd.com>	2024-04-25 15:07:14 -05:00
Bartłomiej Kocot	b1f8ae379b	Fix contraction IsSupported checks (#1257 )	2024-04-23 22:59:39 +02:00

1 2 3 4 5 ...

461 Commits