composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-28 18:56:59 +00:00

Author	SHA1	Message	Date
Illia Silin	52b0bffec0	Support fp64 contraction on gfx94x. (#1029 ) * enable contraction fp64 on gfx94* * fix the logic rocm-6.0.2 rocm-6.0.0	2023-11-08 15:03:57 -08:00
Po Yen Chen	ebcfdb3b40	Disable the SLP vectorizer to prevent unnecessary wait (#1008 ) * Disable the SLP vectorizer to prevent unnecessary wait * Add comment to the reason of adding flag * Fix wording	2023-11-07 22:04:49 -08:00
Po Yen Chen	dcb013fcf2	Avoid force setting ENABLE_PIPELINE_V2_OPT to OFF (#961 ) * Avoid force setting ENABLE_PIPELINE_V2_OPT to OFF * Remove compilation option variable MAX_ILP_OPTS	2023-11-07 22:04:37 -08:00
Jun Liu	5032041365	Merge branch 'amd-develop' into amd-master	2023-10-11 12:26:02 -07:00
Jun Liu	91b414cdac	Merge commit 'ac9595a9f118a023e248eaffcfa5c324f36fd081' into amd-develop	2023-10-11 12:24:51 -07:00
zjing14	ac9595a9f1	Fixed f8_gemm NaN (#975 ) * workaround nan problem by changing output to fp16 * enable f8/bf8 gemm tests on MI200 * workaround f16 to f8 conversion --------- Co-authored-by: Jing Zhang <jizha@amd.com>	2023-10-10 10:30:26 -05:00
Jun Liu	0b70e1cd3c	Merge branch 'amd-develop' into amd-master	2023-10-05 15:46:50 -07:00
Jun Liu	082cf64310	Merge branch 'develop' into amd-develop	2023-10-05 15:46:27 -07:00
Lauren Wrubleski	5913609168	Replace CMake `return` from later CMake (#970 )	2023-10-05 14:58:58 -07:00
Illia Silin	4daedf8ca5	Revert "Add support for mixed precision in contraction scale and bilinear" (#967 ) * Revert "Add support for mixed precision in contraction scale and bilinear (#936)" This reverts commit `f07485060e`. * revert commits #957 and #960	2023-10-05 14:58:23 -07:00
zjing14	570ff3ddbe	remove example 60 (#963 ) Co-authored-by: Jing Zhang <jizha@amd.com>	2023-10-05 09:41:01 -07:00
zjing14	04f93aadb8	Grouped conv bwd data with fp16 input and bf8fp8 comp (#962 ) * Add f8 bf8 gemm example * Add element-wise ops * Add intrinsics * Update reference calculation * Add an additional type option for xdlops gemm * Fix build process * Add bf8 to buffer addressing * Update blockwise op, split typeA and typeB * Update for compatibility * Uppdate naming to f8->fp8 * Update naming * Format * Update naming (#937) * Add a client example * Add computetypes to device and gridwise ops * Add instances, update instance factory * Format * Fix a flag * Add ckProfiler mode * Fix typos * Add an example * Add bf8 generator * add bf8 mfma; fixed type_convert for bf8 * move verfication ahead of timing * Update reference calculation * Fix reference * Narrow down float init range * Fix bf8 bf8 mfma * Add bf8 @ fp8 mfma * Update example * Update instances * Update profiler api * Update for compatibility * Format * Remove extra example * Clean up * workaround convert * added instance of f16_bf8f8, and client example * fixed mfma selector * format --------- Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com> Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> Co-authored-by: Jing Zhang <jizha@amd.com>	2023-10-04 18:04:27 -05:00
Rostyslav Geyyer	42facfc6b7	Add conv bwd weight fp16 comp bf8 fp8 op, instances and example (#945 ) * Add f8 bf8 gemm example * Add element-wise ops * Add intrinsics * Update reference calculation * Add an additional type option for xdlops gemm * Fix build process * Add bf8 to buffer addressing * Update blockwise op, split typeA and typeB * Update for compatibility * Uppdate naming to f8->fp8 * Update naming * Format * Update naming (#937) * Add a client example * Add computetypes to device and gridwise ops * Add instances, update instance factory * Format * Fix a flag * Add ckProfiler mode * Fix typos * Add an example * Add bf8 generator * add bf8 mfma; fixed type_convert for bf8 * move verfication ahead of timing * Update reference calculation * Fix reference * Narrow down float init range * Fix bf8 bf8 mfma * Add bf8 @ fp8 mfma * Update example * Update instances * Update profiler api * Update for compatibility * Format * Remove extra example * Clean up * workaround convert --------- Co-authored-by: Jing Zhang <jizha@amd.com>	2023-10-04 08:19:08 -05:00
zjing14	e921e1f08d	3d grouped conv fwd with input/output fp16 and comp fp8 (#931 ) * add f8 comp instance * fixed * fixed comments * rename * fixed dtype * format * fixed CI * fixed ci * add missing ComputeType * fixed cit * fixed * Update cmake-ck-dev.sh --------- Co-authored-by: Jing Zhang <jizha@amd.com>	2023-10-03 20:04:26 -05:00
zjing14	5311d1b325	changed test for grouped_gemm to be random (#959 ) Co-authored-by: Jing Zhang <jizha@amd.com>	2023-10-03 09:32:58 -05:00
zjing14	aa46039f2d	Fixed contraction issues (#960 ) * add missing ComputeType * fixed * Update cmake-ck-dev.sh --------- Co-authored-by: Jing Zhang <jizha@amd.com>	2023-10-03 09:32:44 -05:00
zjing14	f477fca436	add generic instances (#947 ) Co-authored-by: Jing Zhang <jizha@amd.com>	2023-10-03 09:32:28 -05:00
Jun Liu	7b7a3978b5	Merge branch 'amd-develop' into amd-master	2023-10-02 17:09:58 -07:00
Jun Liu	7e8230daa3	Merge branch 'develop' into amd-develop	2023-10-02 17:08:42 -07:00
Rostyslav Geyyer	bd09b5c538	Add fp8 @ bf8 gemm support and example (#933 ) * Add f8 bf8 gemm example * Add element-wise ops * Add intrinsics * Update reference calculation * Add an additional type option for xdlops gemm * Fix build process * Add bf8 to buffer addressing * Update blockwise op, split typeA and typeB * Update for compatibility * Uppdate naming to f8->fp8 * Update naming * Format	2023-10-02 16:39:03 -05:00
Illia Silin	59dbb01fd1	get rid of gfx900/906, set rocm5.7 as default (#958 )	2023-10-02 12:01:11 -07:00
zjing14	9d58c42103	Contraction multi abd (#957 ) * add gridwise_multi_abd * move element_op into RunRead * merge element_wise op with data read * add multiABD example * allow packed elementwise_op * changed example * clean * clean * add is_detected * fix * minor fix * add scaleAdd_vec4 example * init commit for contraction_multi_ABD * add examples * add examples of multiA and broadcast * update example * fixed comments * Update cmake-ck-dev.sh * Update cmake-ck-dev.sh * Add comments into the example --------- Co-authored-by: Jing Zhang <jizha@amd.com>	2023-10-02 09:18:36 -05:00
Illia Silin	6b5f647371	add gfx942 target to the daily ckprofiler package (#955 )	2023-09-29 08:55:25 -07:00
Bartlomiej Wroblewski	f07485060e	Add support for mixed precision in contraction scale and bilinear (#936 ) * Extract common functionality to separate files * Reference contraction: Remove incorrect consts from type_converts * Reference contraction: Add missing type_convert for dst value * Reference contraction: Fix incorrect order of B matrix dimensions * Add support for mixed precision in contraction scale and bilinear * Move using statements from instances to a common file * Move using statements from examples to a common file * Fix the order of B matrix dimensions across examples and profiler * Fix the computation of error threshold * Make ComputeDataType an optional argument * Include possible DataType -> ComputeDataType casting error in the threshold * Remove commented code	2023-09-29 10:54:31 -05:00
Bartłomiej Kocot	cb53874002	Add grouped conv bwd data wmma (#950 ) * Add grouped conv bwd data wmma * Fix copyrights * Add instances with smaller NPerBlock * Update interface test * Minor stylistic fixes * Minor stylistic fixes	2023-09-28 23:10:18 +02:00
Bartłomiej Kocot	271ef645ac	Add grouped convolution changes to changelog (#952 ) * Add grouped convolution changes to changelog * Fix 0.2.0 ck release rocm version * Suggested CHANGELOG.md edits * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md --------- Co-authored-by: Lisa <lisajdelaney@gmail.com>	2023-09-28 18:18:32 +02:00
Jun Liu	b24d93a127	Merge branch 'amd-develop' into amd-master	2023-09-28 07:52:34 -07:00
Jun Liu	56c7203541	Merge branch 'develop' into amd-develop	2023-09-28 07:52:02 -07:00
Illia Silin	bc1108bb3e	Fix gemm_splitk test, add hip_check_error after kernel calls in kernel_launch. (#951 ) * Added error check after kernel launch (#919) Co-authored-by: Xiaodong Wang <xdwang@meta.com> Co-authored-by: Xiaodong Wang <xw285@cornell.edu> * remove M=0 test cases for test_gemm_splitk --------- Co-authored-by: Xiaodong Wang <xdwang@meta.com> Co-authored-by: Xiaodong Wang <xw285@cornell.edu>	2023-09-27 15:19:33 -07:00
Bartlomiej Wroblewski	f4af5aed8b	Handle type conversions to a const datatype (#944 ) * Handle type conversions to a const datatype * Review: Handle X being const data type as well * Review: Remove typo	2023-09-27 15:02:42 -05:00
Bartłomiej Kocot	e2243a4d1e	Add column to image kernel (#930 ) * Add column to image kernel * Minor fixes for dtypes and client examples * Disable tests for disabled dtypes * Disable add instances functions for disabled data types * Minor stylistic fixes * Revert "Disable add instances functions for disabled data types" This reverts commit `728b869563`. * Instances reduction * Add comments in device_column_to_image_impl * Update changelog and Copyrights * Improve changelog	2023-09-27 17:19:06 +02:00
zjing14	11676c7e49	Add multiple A/B support (#906 ) * add gridwise_multi_abd * move element_op into RunRead * merge element_wise op with data read * add multiABD example * allow packed elementwise_op * changed example * clean * clean * add is_detected * fix * minor fix * add scaleAdd_vec4 example --------- Co-authored-by: Jing Zhang <jizha@amd.com>	2023-09-26 21:16:23 -05:00
Illia Silin	420b5a0382	Use lower case for ckprofiler package. (#948 ) * split ckProfiler gfx9 package into gfx90 and gfx94 * use lower case for package names	2023-09-26 17:43:09 -07:00
zjing14	48ba6e8a69	Fixed Gemmv2r3 kpad (#938 ) * added kpad support into v2r3 * add generic instances * fixed comments * fixed mnk padding * Update device_batched_gemm_xdl.hpp * fixed kpad --------- Co-authored-by: Jing Zhang <jizha@amd.com>	2023-09-26 18:40:00 -05:00
Rostyslav Geyyer	94bfa50256	Add fp8 gemm instances (#920 ) * Add fp8 gemm instances * Update instance naming	2023-09-26 14:59:33 -05:00
Jun Liu	742dd3aa32	Merge branch 'amd-develop' into amd-master	2023-09-26 12:00:18 -07:00
Jun Liu	1f02eaef56	Merge branch 'develop' into amd-develop	2023-09-26 11:59:54 -07:00
Illia Silin	0b296a2722	split ckProfiler gfx9 package into gfx90 and gfx94 (#946 )	2023-09-26 11:22:31 -07:00
Illia Silin	2ea75bd6d7	Resolve some data type issues and cmake policy. (#940 ) * split the types in gemm_bilinear instances, add condition to cmake policy * fix syntax * split the data types in batchnorm examples * fix the batchnorm_bwd test * fix types in the batchnorm_bwd test	2023-09-26 08:39:11 -07:00
Jun Liu	c9013009a0	Merge branch 'amd-develop' into amd-master	2023-09-25 14:32:03 -07:00
Jun Liu	84dcf5d043	Merge branch 'develop' into amd-develop	2023-09-23 18:10:33 -07:00
Bartłomiej Kocot	c95538325b	Add 3d grouped conv fwd wmma instances (#935 ) * Add 3d grouped conv fwd wmma instances * Refactor fwd conv tests * Split wmma instances for each specialization * Minor stylistic fixes	2023-09-23 18:56:31 +02:00
Rostyslav Geyyer	ede64ae9db	Update naming (#937 )	2023-09-22 10:08:45 -05:00
Illia Silin	bba085d2b5	Refactoring cmake files to build data types separately. (#932 ) * refactor cmake files for the tests * refactor cmake files for examples * fix cmake for gemm example * fix the cmake file for all examples * add splitting by data types in gemm_splitk instance header * rename test to reflect only dl instances are used * clean up CI workspace, update cmake for instances * change the jenkinsfile syntax * build all instances except DL on gfx11 * move workspace cleanup after stages * clean up workspace after every stage * isolate data types in grouped_conv_fwd header * isolate dl instances for grouped_conv2d_fwd * fix syntax * fix cmake and batchnorm instances * fix typo * fix reduction instances * fix grouped_conv headers * fix syntax * replace parsing logic for instances, replace bfp16 with bf16 * fix the client examples build * clean up DTYPES from instances cmake files * update the parsing logic in cmake files * make an exception for reduction kernels * update few remaining cmake files to handle DTYPES * fix syntax * fix cmake conflicts * replace f8 with fp8 test name * resolve conflicts for dpp instances	2023-09-20 22:15:56 -07:00
Illia Silin	58817bf967	fix the building of the amd-stg-open compiler (#927 )	2023-09-19 18:50:58 -07:00
Illia Silin	718065ebd2	update to rocm5.7 by default (#925 ) * update to rocm5.7 by default * fix jenkinsfile syntax	2023-09-19 09:35:45 -07:00
Illia Silin	5a4416c8a7	fix the ckprofiler package build in a loop (#926 )	2023-09-19 09:17:39 -07:00
Bartlomiej Wroblewski	63cd459248	Fix DL GEMM instances with too large vector size (#901 ) * Fix vector lengths of DL GEMM instances with padding * Add checks for correctness of vector lenghts in DL GEMM	2023-09-18 14:08:23 +02:00
Rostyslav Geyyer	f17af2e9ed	Add native conversions fp8<->fp32 (#908 ) * Add native conversions * Add bf8 conversions	2023-09-17 20:56:27 -05:00
Bartlomiej Kocot	bc2d0583d3	Stylistic improvements for grouped convolution code Remove unnecessary ignoring Update test/grouped_convnd_bwd_weight/test_grouped_convnd_bwd_weight.cpp	2023-09-15 20:03:47 +02:00

1 2 3 4 5 ...

1074 Commits