composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-12 09:16:52 +00:00

Author	SHA1	Message	Date
jakpiase	e2d139201b	Switch to universal gemm in grouped gemm tile loop (#1335 ) * switch to universal gemm in grouped gemm tile loop * minor fixes * add reviewers comments --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2024-06-18 09:01:49 -05:00
Haocong WANG	764164b488	[GEMM] UniversalGemm update (#1262 ) * Add bf16 instances * Add bf16 gemm universal example * tempsave * Add guard to navi compilation * workground on a specific mixed gemm instance ( bring back it when compiler fix upload) * fix formatting condition statement issue * solve conflict --------- Co-authored-by: Jun Liu <Liu.Jun@amd.com>	2024-04-26 12:56:07 -05:00
Adam Osewski	b4032629e5	Grouped GEMM Multiple D tile loop. (#1247 ) * Overload output stream operator for LoopScheduler and PiplineVersion * Add Run overload accepting grid descriptors MK. * Add __device__ keyword for CalculateGridSize * Create device op GroupedGemmMultipleD * Add GroupedGemm MultipleD Tile Loop implementation. * Add an example for GroupedGemm MultipleD tile loop. * Device Op GroupedGEMMTileLoop. * Bunch of small changes in exmaple. * CkProfiler * Remove unused tparam. * Fix include statement. * Fix output stream overloads. * Do not make descriptors and check validity untill we find group. * Fix gemm desc initialization. * Revert device op * Fix compilation for DTYPES=FP16 * Validate tensor transfers paramters. * Validate on host only NK dims if M is not known. * Fix bug. * A convenient debug func for selecting threads. * Fix has main k block loop bug. * Make sure that b2c has up to date tile offset. * Output stream operator for Sequence type. * Cmake file formatting.	2024-04-25 15:12:53 -05:00
ltqin	f448d179b7	Universal gemm flush cache (#1251 ) * add flush cache to device op * add flush cache parameter to ckProfiler * change calculate size a and b method * chang evaluation time method foro AVERAGE to MEDIAN * format code * adjust some code * fix core dumped * remove loop call flush icache in kernel * remove loop(outer) call flush icache --------- Co-authored-by: letaoqin <letaoqin@amd.com>	2024-04-25 15:07:14 -05:00
Bartłomiej Kocot	ad1597c499	Refactor elementwise kernels (#1222 ) * Refactor elementwise kernels * Instances fixes * Fix cmake * Fix max pool bwd test * Update two stage gemm split k * Restore elementwise scale for hiptensor backward compatiblity * Fix Acc data type check in conv fwd multiple abd * Disable conv fp64 fwd example * Update grouped conv weight multi d	2024-04-19 13:31:17 +02:00
jakpiase	e0f3f918f1	Add bf16 and bf16@int8 mk_nk_mn instances for grouped gemm two stage (#1228 ) * added bf16 and bf16@int8 mk_nk_mn instances * fix preprocessor guards	2024-04-19 13:16:10 +02:00
Haocong WANG	f83e9701e9	[GEMM] Gemm universal device operation (#1154 ) * Optimize GEMM on MI200/300: 1. Add new blockwise gemm pipeline 2. Add irregular splitk intances * clang format + typo fix * Fix a bug * initial commit * Add more instances to irregular splitk * blkgemm pipeline v1~4 prototype * Sanity Checked. Known issue: 1. Poor performance of splitk 2. Register spill on blkgemmpipeline v3 * Sanity and Performance fix: 1. fix a bug related to sanity in grouped b2c mapping 2. fix a bug related to sanity and performance in splitk offset * Sanity and API update: 1. Remove prefetch stage 2. Fix valid check bug 3, Add first gemm_universal instance into ckProfiler * Add NN instances for gemm universal * 1. Add NT instances for gemm_universal 2. Fix a bug about Kpadding in gemm_universal * Fix a bug regarding padding Odd K number * remove kernel print * Fix KPadding bug... * Update safety check * another try to fix kpadding.. * Sanity checked * new instances.. * clang format+typo fix * remove clang format script's change * Add non-hotloop compile option * 1. Add fp16xfp8 example 2. pull packed convert f8 from pr1150 * Some miscs.. opt and fix * Add pipeline description docs * Split universal gemm instance library to cut profiler compiling time * uncomment cmakefile * Fix a bug caused by blockwise_gemm_pipe_v2 * reduce default splitk to 1 * Add 224x256x64 tile size * update, including: 1. Experiment pipeline 5~7 2. Optimization for pipeline 4 3. Organized instance library * temp save * temp save * Permuted lds layout, sanity and function checked * clang format * Move OOB check from RunRead to RunWrite, for better software pipeline. TODO: agpr spill when NN layout * clangformat * A/B splitpipe scheduler for v3 * Fix two bugs * bug fix * fix a bug in oob check * Example for mixed fp16_fp8 gemm * Clean experimental code blocks * Add mixed precision gemm into profiler * tempsave * optimize m/n major lds layout * Add RRR GEMM mixed precision instances * Optimize f8 matrix transpose * Add test_gemm_universal * A/B spilt schedule for blkpip v5 * Take ds_read2 into iglp scheduling scheme * format * fixed cmake * Add llvm-option into CI cmake flag --------- Co-authored-by: Jing Zhang <jizhan@amd.com>	2024-04-13 21:03:18 -05:00
Rostyslav Geyyer	bbefc12a26	Add instances for conv_scale with bf8@fp8->fp8 (#1231 ) * Add instances * Add example * Add profiler mode * Add client example	2024-04-11 10:35:00 -05:00
Bartłomiej Kocot	ced5af16f7	Extend support for contraction 6D (#1207 ) * Extend support for contraction up to 5D * Extend contraction bilinear instances * Fix interface test * Add 6d support, remove 3d,4d,5d * Fixes * Fix readme * Make defualt dim for contraction instances	2024-04-09 23:46:21 +02:00
jakpiase	c701071666	Add Grouped Gemm Multiple D SplitK TwoStage (#1212 ) * Support A/B/C elementwise ops. * First part of GGEMM multiD splitk two stage. * WIP - changes for debuggin. * tmp save * working version * added bf16@int8 version * fixes * add reviewers sugestions * pre-commited missing files * switched to ifs from elseifs --------- Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>	2024-04-04 11:01:33 +02:00
Rostyslav Geyyer	a61e73bc56	Add instances for conv_scale with fp8@bf8->fp8 (#1220 ) * Update device op api to support BComputeType * Add example * Add instances * Add profiler mode * Add client example * Update copyright year * Add BComputeType check * Fix compute types	2024-04-03 09:08:08 -05:00
Illia Silin	ae57e5938e	Split the instances by architecture. (#1223 ) * parse examples inside the add_example_executable function * fix the example 64 cmake file * add xdl flag to the gemm_bias_softmax_gemm_permute example * add filtering of tests based on architecture type * enable test_grouped_gemm for gfx9 only * enable test_transpose only for gfx9 * only linnk test_transpose if it gets built * split the gemm instances by architectures * split gemm_bilinear,grouped_conv_bwd_weight instances by targets * split instances by architecture * split grouped_conv instances by architecture * fix clang format * fix the if-else logic in group_conv headers * small fix for grouped convolution instances * fix the grouped conv bwd weight dl instances * fix client examples * only enable client examples 3 and 4 on gfx9 * set the gfx9 macro * make sure the architecture macros are set by cmake * use separate set of xdl/wmma flags for host code * sinmplify the main cmake file * add conv_fwd_bf8 instance declaration	2024-04-02 09:42:17 -07:00
Bartłomiej Kocot	9c052804a7	Add elementwise with dynamic vector dim (#1198 ) * Add elementwise with dynamic vector dim * Reduce number of instaces * Fixes * Fixes	2024-03-22 10:40:43 +01:00
Rostyslav Geyyer	fd0d093e78	Add instances for conv_scale with bf8 in / fp8 out (#1200 ) * Add bf8 conv fwd instances * Add example * Add profiler mode * Add client example * Fix copyright headers * Format	2024-03-21 13:57:34 -05:00
Rostyslav Geyyer	e626d5202a	Add instances for conv_scale with fp8 in/out (#1193 ) * Add fp8 conv instances and client example * Format * Add example * Update cmakelists * Add profiler mode * Format * Fix copyright headers	2024-03-15 09:50:03 -07:00
jakpiase	32d4be3d09	Add support for mixed precision bf16&int8 grouped gemm (#1166 ) * add support for mixed precision bf16&int8 grouped gemm * fix gfx versions and add bf16 kbatch condition * added reviewers comments	2024-02-21 10:35:35 +01:00
Bartłomiej Kocot	66736edb95	Extend permute scale support up to 6D (#1168 ) * Extend permute scale support up to 6D * Fixes * Fixes * Update profiler/README.md Co-authored-by: Lisa <lisajdelaney@gmail.com> * Update profiler/README.md Co-authored-by: Lisa <lisajdelaney@gmail.com> * Update profiler/README.md Co-authored-by: Lisa <lisajdelaney@gmail.com> * Update profiler/README.md Co-authored-by: Lisa <lisajdelaney@gmail.com> * Update profiler/README.md Co-authored-by: Lisa <lisajdelaney@gmail.com> * Update profiler/README.md Co-authored-by: Lisa <lisajdelaney@gmail.com> * Update profiler/README.md Co-authored-by: Lisa <lisajdelaney@gmail.com> --------- Co-authored-by: Lisa <lisajdelaney@gmail.com>	2024-02-20 09:56:54 -08:00
jakpiase	ba86eadce5	Add support for mixed-precision f16bf16_int8 gemm (#1127 )	2024-02-07 15:54:13 +01:00
rocking	28f68a5a99	layernorm & groupnorm bwd gamma beta (#1133 ) * Add layernorm bwd gamma beta external api * Add groupnorm external api * Add layernorm bwd gamma beta profiler * Add groupnorm bwd gamma beta ckProfiler * Add layernorm & groupnorm bwd gamma beta test * Fix groupnorm bwd gamma beta profiler bug * Layernorm bwd weight client example * Groupnorm bwd weight client example * clang format * Remove useless header * Let inv_std be positive * Rename to num_bytes and move this calculation outside the loop	2024-01-25 19:53:15 +08:00
Illia Silin	180e572076	Fixing most of the cppcheck errors. (#1142 ) * fix cppcheck errors, first pass * fix format * fix returned value in examples * add macro definitions for cppcheck * fix the profile_gemm logic * update the gemm profiler logic * add more difinitions to cppcheck, fix couple more errors * replace runtime error with message in device function * fix a couple of int4 issues * no return for fill function * fix errors in data_types.hpp * fix format * fix few remaining errors * fix errors in data_types.hpp * fix last couple of errors in datat_types.hpp	2024-01-24 13:47:48 -08:00
Illia Silin	886d9eeb99	Add an option to change the number of warm-up cycles and iterations. (#1124 ) * allow setting the number of warmup cycles and iterations for profiler * fix the gemm_splitk and grouped_gemm examples	2024-01-09 09:43:08 -08:00
arai713	aa3e2d7967	Transpose profiler fix (#1114 ) * added working example for 5D input using 1D kernel * example with 5D input tensor and 2d kernel - not working: issues with arguments * added updated version of 3d device op - changed descriptors/dims * added example file to check kernel * fixed descriptor and isSupportedArgument stride problem * added and modified kernel for 3d - updated tids/loop * adding some more 5d example files * fixed some issues * changes made for testing * working version: fixed error in stride for A, still a bit inefficient * cleaned up formatting/comments * updating formatting * more formatting fixes * fixing cmake, adding back gpu targets in cmake script * adding client example * added instances for client example * fixed errors in client example * implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp * removed extra files * minor formatting and naming fixes * adding test files and profiler * fixing minor error * minor fix * removed unneccesary comments, renamed files * updated instance list for client example, added different layout example * removing instances * fixed error in instance generation * remove comments * update profiler and client example tensor layouts * fixed errors in test/profiler * updated vector dim access to enable vector load * updated test/profiler files * updated example with 1d kernel * updating profiler * renamed files * disabled device op for MI300 * skip elementwise_permute_2d on gfx94x * Update CMakeLists.txt * fixing CMake - disabling some GPU targets * added transpose profiler to CMake * fixed transpose profiler errors * fixed instances for tests/profiler * cleaned up code in transpose profiler source code * added some comments, updated copyright * made function arguments const where possible --------- Co-authored-by: Jing Zhang <jizha@amd.com> Co-authored-by: Jing Zhang <jizhan@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com>	2024-01-04 10:33:19 -06:00
Artur Wojcik	fb5bd51b42	enable compilation of INSTANCES_ONLY for Windows (#1082 ) * enable compilation of INSTANCES_ONLY for Windows * suppress ROCMChecks warnings on GoogleTests * suppress -Wfloat-equal warning on GoogleTests --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2023-12-20 14:34:53 -08:00
rocking	a69aa2a11a	layernorm and groupnorm backward data (#1083 ) * rename folder * Add type string * Remove typo * Add deviceOp to backward x * Add comment to describe the behavior of backward normalization * Add kernel function, prepare to implement * implement generic kernel * Check vector size * Add sweep once pipeline for small reduce size * Fix bug of KRaw_ error * Fix bug of dx stride * sanity check for mean and rstd * backward x for groupnorm * Add bwd x instance * add layernorm 2d bwd gamma beta instances * Change save mean var type from f32 to f16 in f16 mode * Change the example to f16 * Add groupnorm bwd gamma beta instance * Add groupnorm bwd x instance * Fix naming * Add layernorm bwd x ckprofiler * Add groupnorm bwd x profiler * clang format * Rename bwd x to bwd data * Fix bug of verification in profiler * Add test of layernorm and groupnorm bwd data * Add missing cmake * Add layernorm2d bwd data * rename fwd example * Add groupnorm client example * Fix typo. replace Invarient with Invariant * Add checking before running the best instance	2023-12-19 04:23:11 +08:00
zjing14	33600202c6	remove imcomplete transpose profiler (#1088 ) Co-authored-by: Jing Zhang <jizha@amd.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2023-12-07 13:39:40 -06:00
arai713	a2969aa8b6	Disable transpose device op for MI300 (#1050 ) * added working example for 5D input using 1D kernel * example with 5D input tensor and 2d kernel - not working: issues with arguments * added updated version of 3d device op - changed descriptors/dims * added example file to check kernel * fixed descriptor and isSupportedArgument stride problem * added and modified kernel for 3d - updated tids/loop * adding some more 5d example files * fixed some issues * changes made for testing * working version: fixed error in stride for A, still a bit inefficient * cleaned up formatting/comments * updating formatting * more formatting fixes * fixing cmake, adding back gpu targets in cmake script * adding client example * added instances for client example * fixed errors in client example * implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp * removed extra files * minor formatting and naming fixes * adding test files and profiler * fixing minor error * minor fix * removed unneccesary comments, renamed files * updated instance list for client example, added different layout example * removing instances * fixed error in instance generation * remove comments * update profiler and client example tensor layouts * fixed errors in test/profiler * updated vector dim access to enable vector load * updated test/profiler files * updated example with 1d kernel * updating profiler * renamed files * disabled device op for MI300 * skip elementwise_permute_2d on gfx94x * Update CMakeLists.txt * fixing CMake - disabling some GPU targets --------- Co-authored-by: Jing Zhang <jizha@amd.com> Co-authored-by: Jing Zhang <jizhan@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-11-29 11:36:40 -06:00
Chao Liu	e1fa00917c	[Hotfix] Remove unsed profile_transpose.cpp (#1046 )	2023-11-16 14:49:46 -08:00
arai713	3af8c81a72	Transpose 3d (#984 ) * added working example for 5D input using 1D kernel * example with 5D input tensor and 2d kernel - not working: issues with arguments * added updated version of 3d device op - changed descriptors/dims * added example file to check kernel * fixed descriptor and isSupportedArgument stride problem * added and modified kernel for 3d - updated tids/loop * adding some more 5d example files * fixed some issues * changes made for testing * working version: fixed error in stride for A, still a bit inefficient * cleaned up formatting/comments * updating formatting * more formatting fixes * fixing cmake, adding back gpu targets in cmake script * adding client example * added instances for client example * fixed errors in client example * implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp * removed extra files * minor formatting and naming fixes * adding test files and profiler * fixing minor error * minor fix * removed unneccesary comments, renamed files * updated instance list for client example, added different layout example * removing instances * fixed error in instance generation * remove comments * update profiler and client example tensor layouts * fixed errors in test/profiler * updated vector dim access to enable vector load * updated test/profiler files * updated example with 1d kernel * updating profiler * renamed files --------- Co-authored-by: Jing Zhang <jizha@amd.com>	2023-11-08 19:45:07 -06:00
rocking	a3d9a2cd42	Layernorm4d (#1022 ) * Rename folder * Add layernorm 4d fwd example * Rename original layernorm example * Add layernorm 4d f16 test * Add layernorm4d_fwd client example * Support layernorm4D in ckProfiler * Rename groupnorm to groupnorm fwd in example * Rename layernorm and group fwd in test * Rename normalization to normalization_fwd (instances) * Add fwd to DeviceNormalization * Rename external api header * Rename folder, because we can also add bwd in this folder * Add fwd in layernorm and groupnorm (profiler * Fix compile error --------- Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>	2023-11-09 08:34:51 +08:00
zjing14	98fd41f597	Add Gemm instances for performance improvement (#1018 ) * improve kpad * more tuning parameters * f16_f8_fp16 * cut test time * add f16_f8_fp16 * add f16_f8_f16 * testing instances for skinny cases * format * clean * add fp16_f8_fp16 * clang-format * add grouped gemm instalces * fixed profile grouped_gemm * clean * clean * clean * clean * clean * add missing instance func * fixed inferface --------- Co-authored-by: Jing Zhang <jizha@amd.com> Co-authored-by: root <root@sh5-1e707-rc06-38.mkm.dcgpu>	2023-11-07 09:09:58 -06:00
Bartlomiej Wroblewski	4ef704d8a6	Add support for mixed precision in contraction scale and bilinear (#973 ) * Add support for mixed precision in contraction scale and bilinear (#936) * Extract common functionality to separate files * Reference contraction: Remove incorrect consts from type_converts * Reference contraction: Add missing type_convert for dst value * Reference contraction: Fix incorrect order of B matrix dimensions * Add support for mixed precision in contraction scale and bilinear * Move using statements from instances to a common file * Move using statements from examples to a common file * Fix the order of B matrix dimensions across examples and profiler * Fix the computation of error threshold * Make ComputeDataType an optional argument * Include possible DataType -> ComputeDataType casting error in the threshold * Remove commented code * Make the ComputeDataType an optional argument in instance --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2023-11-02 14:26:33 -07:00
Bartłomiej Kocot	2e824c6d46	Add support for groups in Img2Col/Col2Img (#1007 ) * Add support for groups in Img2Col/Col2Img * Fix interface test * Fix interface test G to N * Improve performance * Change gemm layout to 3d * Fixes	2023-10-31 10:46:32 +01:00
rocking	3696fe1c76	Layernorm and groupnorm support to save mean and inverse std in forward (#929 ) * save mean and inverse std in normalization * Save mean and inverse std in splitK * Vector save mean and inv std * Modify instance for save mean and std * simplify the layernorm example * Save mean and std in groupnorm example * Save mean and inv std in ckProfiler and test * Remove compute data type from base class * Save mean and inv std in client example * Add changelog * clang format * Fix compile error * Refine naming * Avoid error in bf16 * revert changelog	2023-10-19 07:36:29 +08:00
zjing14	bf435140dc	Clean DTYPES conditions in CMake (#974 ) * Add a condition to build fp8 instances * simplified buffer_load/store * add bfp8/fp8 * fixed * remove all f8/bf8 condition include folder * fixed cmake conditions * fixed DTYPES=fp16/bfp16 * fix * fixed buffer_load * fixed buffer_store * fix * clean example cmake files * fixed ci * fixed cit --------- Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com> Co-authored-by: Jing Zhang <jizha@amd.com>	2023-10-18 11:14:14 -05:00
Bartłomiej Kocot	16d7c4d2f7	Add grouped conv bwd weight wmma (#985 ) * Add grouped conv bwd weight wmma * Update README, changelog, profiler * Minor fixes * Fix grouped conv bwd wei dl kernel * Minor fixes * Minor stylistic fixes	2023-10-17 10:32:26 +02:00
Rostyslav Geyyer	fa753f27ba	Add splitk gemm fp16 @ fp16 with fp8 compute instances (#983 ) * Add ComputeType * Update for compatibility * Add instances * Update profiler api	2023-10-13 16:27:11 -05:00
Illia Silin	4daedf8ca5	Revert "Add support for mixed precision in contraction scale and bilinear" (#967 ) * Revert "Add support for mixed precision in contraction scale and bilinear (#936)" This reverts commit `f07485060e`. * revert commits #957 and #960	2023-10-05 14:58:23 -07:00
Rostyslav Geyyer	42facfc6b7	Add conv bwd weight fp16 comp bf8 fp8 op, instances and example (#945 ) * Add f8 bf8 gemm example * Add element-wise ops * Add intrinsics * Update reference calculation * Add an additional type option for xdlops gemm * Fix build process * Add bf8 to buffer addressing * Update blockwise op, split typeA and typeB * Update for compatibility * Uppdate naming to f8->fp8 * Update naming * Format * Update naming (#937) * Add a client example * Add computetypes to device and gridwise ops * Add instances, update instance factory * Format * Fix a flag * Add ckProfiler mode * Fix typos * Add an example * Add bf8 generator * add bf8 mfma; fixed type_convert for bf8 * move verfication ahead of timing * Update reference calculation * Fix reference * Narrow down float init range * Fix bf8 bf8 mfma * Add bf8 @ fp8 mfma * Update example * Update instances * Update profiler api * Update for compatibility * Format * Remove extra example * Clean up * workaround convert --------- Co-authored-by: Jing Zhang <jizha@amd.com>	2023-10-04 08:19:08 -05:00
Bartlomiej Wroblewski	f07485060e	Add support for mixed precision in contraction scale and bilinear (#936 ) * Extract common functionality to separate files * Reference contraction: Remove incorrect consts from type_converts * Reference contraction: Add missing type_convert for dst value * Reference contraction: Fix incorrect order of B matrix dimensions * Add support for mixed precision in contraction scale and bilinear * Move using statements from instances to a common file * Move using statements from examples to a common file * Fix the order of B matrix dimensions across examples and profiler * Fix the computation of error threshold * Make ComputeDataType an optional argument * Include possible DataType -> ComputeDataType casting error in the threshold * Remove commented code	2023-09-29 10:54:31 -05:00
Bartłomiej Kocot	e2243a4d1e	Add column to image kernel (#930 ) * Add column to image kernel * Minor fixes for dtypes and client examples * Disable tests for disabled dtypes * Disable add instances functions for disabled data types * Minor stylistic fixes * Revert "Disable add instances functions for disabled data types" This reverts commit `728b869563`. * Instances reduction * Add comments in device_column_to_image_impl * Update changelog and Copyrights * Improve changelog	2023-09-27 17:19:06 +02:00
Rostyslav Geyyer	94bfa50256	Add fp8 gemm instances (#920 ) * Add fp8 gemm instances * Update instance naming	2023-09-26 14:59:33 -05:00
Rostyslav Geyyer	62d4af7449	Refactor f8_t, add bf8_t (#792 ) * Refactor f8_t to add bf8_t * Add check_err impl for f8_t * Update fp8 test * Format * Revert the fix * Update vector_type implementation * Add bf8 test * Add bf8, use BitInt types * Add bf8 conversion methods * Update type_convert for fp8/bf8 * Add check_err fp8/bf8 support * Add subnorm fp8 tests * Add subnorm bf8 tests * Fix conversion * Add bf8 cmake bindings * Add macros to enable build with disabled fp8/bf8 * Remove is_native method * Update flag combination for mixed precision instances * Add more flag checks * Add another flag to a client example * Add type traits, decouple f8/bf8 casting * Clean up * Decouple fp8 and bf8 flags * Remove more redundant flags * Remove leftover comments	2023-09-12 17:04:27 -05:00
Haocong WANG	562b4cec48	[Navi3x] Add fp16/int8 wmma conv forward instances (#746 ) * fix wmma gemm int8; add grouped conv int8 example * Add int8 gemm-bilinear instances * compile sanity check unknown * Sanity pass + clang-format * add int8 conv profiler instances * solve merge conflict --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Chao Liu <chao.liu2@amd.com>	2023-09-07 21:59:26 -05:00
Bartłomiej Kocot	0077eeb3be	Add image to column kernel (#867 ) * Add image to column kernel * Add instances, tests, profiler, example * Add client example * Several fixes of image to column * Fix variable name in device_image_to_column_impl * Several fixes of image to column profiler * Fix num_btype calculation * Make new mesaurements for correct bytes calculation	2023-09-05 10:11:40 -05:00
rocking	866377de18	MaxPool & AvgPool bwd instances, test, ckProfiler, client example (#861 ) * Add maxpool instances * Rename index pool to max pool. * Add maxpool bwd bf16 instances * Add avg pool bwd instances * Rename avgpool and maxpool to avg_pool3d and max_pool * Add bf16 pool fwd instances * Add max pool bwd to ckProfiler * Add avg pool3d bwd to ckProfiler * Add avg pool bwd test * Fix bug of reference pool fwd (dilation) * Fix bug of max pool bwd (dilation and initZero) * Support bf16 compute data type * Force compute type be f32. Because atomicAdd only support f32 * Add max pool bwd test * Rename folder * Rename pool * Add max pool bwd client example * Add avg pool bwd client example * Add missing workspace * clang format * Rename macro * remove useless header * remove useless layout	2023-08-31 21:01:50 +08:00
zjing14	31ea132aa2	Fp16/fp8 mixed-precision Gemm with multiply+add fusion (#865 ) * add compute_type * add multiply_add ckProfiler * add f8_fp16 support * clean * clean * fixed lds size calc * format --------- Co-authored-by: Jing Zhang <jizha@amd.com>	2023-08-28 16:27:32 -05:00
Jun Liu	c8a8385fdd	[HotFix] add config and version files to pass on build info (#856 ) * experiment with config file * experiment with version.h config * add more info to version.h * minor updates * minor updates * fix case where DTYPE is not used * large amount of files but minor changes * remove white space * minor changes to add more MACROs * fix cmakedefine01 * fix issue with CK internal conflict * fix define and define value * fix clang-format * fix formatting issue * experiment with cmake * clang format v12 to be consistent with miopen * avoid clang-format for config file	2023-08-23 11:36:17 -07:00
Rostyslav Geyyer	eac50708d9	Add instances/ckProfiler/client example for fp8/fp16 mixed precision Gemm (#853 ) * Add ComputeType arg to splitk device and gridwise ops * Update for gridwise op compatibility * Update bf16 and int8 splitk gemm examples with ComputeType * Add instances * Update ckProfiler for mixed precision cases * Add a mixed precision splitK gemm client example --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-08-22 09:34:49 -05:00
rocking	f60f0a5e03	Refactor pool fwd (#815 ) * Do not hardcode stride * devicePool2DFwd Inherit devicePool3DFwd * Move instance declaration out of common * Add dilation * use the pool3d rank, because pool2d inherit pooo3d * calculate Do Ho Wo for the dilation * Fix header name * Modify ckProfiler * Remove pool2d instance * Remove pool2d in profiler * Remove pool2d and add dilation * In to client example, this commit revise following: 1. Add dilation. 2. Use pool3d to implement pool2d * Refine naming and IsSupportedArgument() * Add dilation to maxpool bwd example * clang format * 1. Remove useless header 2. Fix copyright 3. Refine naming * Add layout parameter to pool fwd * clang format * Fix merge error * Fix compile error * Remove layout parameter in derived class * Refine changlog * Fix compile error * Fix compiler error * Add layout to external api and profiler	2023-08-15 02:25:28 +08:00
Illia Silin	08eb176929	Allow building CK for specific data types and split off last remaining DL instances. (#830 ) * properly split conv_nd_bwd_data instances * split conv2d_fwd instance data types * split the gemm, conv2d_fwd and batched_gemm_softamx_gemm * split the tests by data types where possible * filter examples by DTYPES * split few remaining examples by DTYPES * filter most instances by DTYPES * add new lines at end of headers, fix grouped_gemm profiler * fix syntax * split the ckprofiler instances by DTYPES * split the conv2d and quantization DL and XDL instances * fix the splitting of conv2d DL instances * split softmax and pool_fwd tests for fp16 and fp32 types * fix syntax * fix the dl_int8 quantization instances isolation	2023-08-07 14:56:10 -07:00

1 2 3

130 Commits