composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-17 00:58:44 +00:00

Author	SHA1	Message	Date
Bartłomiej Kocot	dd6d0dd3b9	Support NHWGC conv2d_bwd_weight (#769 ) * Support NHWGC conv2d_bwd_weight * Fix client example * Fix client example * Fix comments * Redesign grouped_conv_bwd_weight instances * Clang format fix --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `1ee99dcaa6`]	2023-07-12 08:25:02 -05:00
Po Yen Chen	aff6040b5b	Split GEMM instance library & enable pipeline v2 optimization (#783 ) * Move source file into sub-directories * Add missing include directive * Split DeviceGemmXdl<> fp16 instances * Fix format * Remove unnecessary CMakeLists.txt * Add macros to toggle new features * Remove debug message * Turn off GEMM v2 pipeline optimization by default * Fix format * Extract duplicated string as list * Enlarge indent in CMakeLists.txt [ROCm/composable_kernel commit: `850144a0d3`]	2023-07-06 10:59:35 -05:00
Adam Osewski	da8a7b63ec	Move Device Ops implementations into impl directory. (#777 ) Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `f4dfc060b7`]	2023-07-06 16:15:51 +02:00
Bartlomiej Kocot	27c7825316	Fix copyrights for DeviceBatchedGemmMultipleD_Dl [ROCm/composable_kernel commit: `2b0b6d9f46`]	2023-07-06 15:50:27 +02:00
Bartłomiej Kocot	8e7e512358	Support bf16/f32/f16 and NHWGC conv2d_bwd_data (#757 ) * Support bf16/f32/f16 and NHWGC conv2d_bwd_data * Add interface test * clang format * Comment fixes * Add more friendly error message [ROCm/composable_kernel commit: `63388e84ab`]	2023-06-21 08:20:31 -05:00
Qianfeng	d6f690d361	Padded Generic Kernel Instance (#730 ) * Add NumReduceDim template parameter to DeviceSoftmax and Softmax client API to simplify instances collecting * Move the generic kernel instance to be the first of the instance list for elementwise op of normalization * Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax * Add testing of GetGenericInstance() in client_example of Softmax * Revert "Add testing of GetGenericInstance() in client_example of Softmax" This reverts commit `f629cd9a93`. * Revert "Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax" This reverts commit `a9f0d000eb`. * Support generic kernel instance to be the first instance returned by GetInstances() for GroupNorm * Move generic kernel instance to separate tuple for elementwise op of normalization * Remove un-used files for softmax instance * Store generic kernel instance to separate tuple for softmax * Add IsSupported checking for generic instance to client example of softmax * Replace the get_device_normalize_from_mean_meansquare_instances() by the DeviceOperationInstanceFactory class for elementwise-normalization * clang-format fix * Remove int8 from softmax instances --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `0d9118226b`]	2023-06-16 23:43:11 -05:00
zjing14	973fc655fd	Fixed Weight layout of grouped_conv 3d fwd (#743 ) * Changed wei layout * changed layout for examples * fixed client example --------- Co-authored-by: root <root@ctr-ubbsmc15.amd.com> [ROCm/composable_kernel commit: `309b1c6461`]	2023-06-15 10:19:33 -05:00
Rostyslav Geyyer	f0c9daa292	Add generic kernel instances for ck::tensor_operation::device::DeviceGemmMultipleD (#741 ) * Add generic instance gemm_add_add_fastgelu * Add a client example for generic gemm_add_add_fastgelu * Update CMakeLists * Format * Format * Add generic instance gemm_add_fastgelu * Format * Add a gemm_add_fastgelu client example * Format * Add generic instance gemm_fastgelu * Format * Fix argument order * Add gemm_fastgelu client example * Add exceptions if argument is not supported [ROCm/composable_kernel commit: `54b68eb343`]	2023-06-14 16:06:56 -05:00
Bartłomiej Kocot	1405a4906b	Add DeviceBatchedGemmMultipleD_Dl (#732 ) * Add DeviceBatchedGemmMultipleD_Dl * Fix batched_gemm tests * Fix comments * test_batched_gemm_multi_d fixes * Fix args for isSupported batchedGemmMultipleDDl * Disable tests for gfx90a [ROCm/composable_kernel commit: `fc9f97568f`]	2023-06-12 08:37:15 -05:00
ltqin	8c5f5f1293	Fix flash attn mask bug (#733 ) * add check input parameter * add instance for vector load = 1 * move gerneral instance to first pos * fix read bias code * regular code for bias load --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `0ede66de54`]	2023-06-12 08:35:31 -05:00
Illia Silin	d40b8d5e2c	update copyright headers (#726 ) [ROCm/composable_kernel commit: `b94fd0b227`]	2023-05-31 18:46:57 -05:00
Adam Osewski	b145984ea1	Multiple fixes to GroupedGemm+SplitK (#707 ) * Add license header. * Reduce number of logged output. Add constant initialization. * Add functional tests for grouped_gemm with different kbatch value. * Add debug log informations + remove unused code. * Don't pass kbatch to CalculateKPadded. * Turn on logging in grouped gemm and gemm splitk profiler * Debug: limit number of test cases to run; * Log more information and initialize with constant value. * Turn on DEBUG_LOG * Add more debug log informations. * Limit the number of instances to compile. * Use GridwiseGemmPipeline * Use KBatch to calculate K0 * Multiple DebugLog messages. * Unit tests for multiple KBatch values. * Refactoring * Disable logging * extract out of if statement KBatch update. * Uncomment instances. * Disable DebugLog. * Use Kbatch when calculate KPadded. * Fix CGridDesc padding. * Use available helper functions. * Uncomment code commented for debuggin. * Remove unnecessary debug log messages. * Uncomment previously commented code for debug purposes. * Add KBatch info to profiler output summary log. * Add gtests for gemm splitk using ckProfiler API. * Add more test-cases for different data layout. * Add more test cases for gemm splitk * Remove old test. * Unit tests for MKNK ggemm interface. * Fix and add more unit-tests. * Constepxr everything! * Increase error threshold for fp16 and splitk. Since we're using fp16 atomic add for splitk there's a known precision loss. --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `70e4eb567f`]	2023-05-30 07:09:06 -05:00
Bartłomiej Kocot	18002ddb3c	Add instances for fp16/int8 Gemm kernels (Navi21) (#717 ) * Add instances for fp16/int8 Gemm kernels (Navi21) * Extend instances with smaller tiles * Fix SrcVectorTensor for km_kn_mn int8 [ROCm/composable_kernel commit: `c2d7a29dec`]	2023-05-30 07:07:17 -05:00
rocking	266e37d8fd	Pool3d fwd (#697 ) * Expand the base class of pool2d, prepare to share base class with pool3d * Add pool3d device op * Add pool3d f16 example * Refactor the base class. implement generic pooling in the future * clang format * get original index in max pooling * Add outputindex to base class * Fix dimension * Add pooling instance * Use indexType instead * Remove useless header * Extract IndexDataType to template * Extract pooling reference code * clang format * clang format * Fix typo * Add tensor stride * Add missing header * Add index stride and output stride * Refine naming * Add type to base class * Rename file * Use proper size * Fix typo * Refine naming * Modify the argument into vector. * Add max pool profiler * Refine naming * Support f32 pool * Fix typo * Add avg pool2d fwd in profiler * clang format * Rename AccDatatype to ComputeDatatype * Fix init * test pool * Extract variable * Add client example * Check the pooling dim * clang format * Connect argv and arg_parser * Add found check * Remove useless header * Refine naming * Adjust the order of device_pool_fwd [ROCm/composable_kernel commit: `76ec0089fb`]	2023-05-24 09:05:04 -05:00
Adam Osewski	db4216c421	Grouped Gemm + SplitK + simplified Kernel Args (#669 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * B2C with 3D grid for KSplit * Remove unused code. * Use default B2C (3D grid) in grid gemm v2r4r2. * Device gemm splitk use B2C map. * Device GroupedGemmXdlSplitKCShuffle * Example for GroupedGemm Xdl SplitK * Introduce Device GroupedGemmSplitK * Fix updating kbatch size. * Add instance mk-nk-mn * Enable set kbatch in profiler. * Add GGemmSplitK mk-kn-mn instances * Add more instances & split into multiple files. * minor fix * tuning * clean * disabled failed instances * use pipe v2 * Ignore arg on not supported arch. * fix warning --------- Co-authored-by: carlushuang <carlus.huang@amd.com> Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Jing Zhang <jizhan@amd.com> Co-authored-by: root <root@ctr-ubbsmc15.amd.com> [ROCm/composable_kernel commit: `8bb2bb4a05`]	2023-04-24 15:43:36 -05:00
rocking	17201085bb	Revise layout of group convolution (#675 ) * [What] Remove pure conv int8 instance [Why] We will never use pure int8 conv in AI, use int8 quantization instead * Change layout * Share the kernel parameter * Support more type of NHWGC for group conv * Revise client example of conv 2d, use NHWGC layout * Add instance to cmake * Revise layout of group conv quantization instance * Revise layout of external api of group conv quantization * Revise layout of group conv quantization client example * Fix clang format * Add comment to describe meaning of each parameter [ROCm/composable_kernel commit: `3eecbfb6ec`]	2023-04-23 23:40:00 -05:00
Illia Silin	64dc32a54b	Put back the split-k gemm code. (#684 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout --------- Co-authored-by: carlushuang <carlus.huang@amd.com> [ROCm/composable_kernel commit: `903cd19ce3`]	2023-04-21 19:37:00 -05:00
rocking5566	44c84a24d3	Add (#677 ) [ROCm/composable_kernel commit: `fd11a4a12a`]	2023-04-17 10:12:10 -05:00
rocking5566	2598be1afd	Groupnorm + swish external api (#668 ) * Rename to proper naming * Add example of groupnorm + swish * Extract duplicate code in example * Add groupnorm + swish instances * Ractor instance generation, split into multiple cpp file * Add external api and client example * Refine profiler message * Use ck math version of exp * Refine problem size in example * Add host version of exp [ROCm/composable_kernel commit: `ed3a2e5226`]	2023-04-10 08:02:17 -05:00
Jun Liu	d32add6de2	Issue #666 : Revert "simplify karg in device/grid of split-k op (#644 )" (#665 ) This reverts commit 469cce884ed93ab0e59e793df5b3c00d7657bf7a. [ROCm/composable_kernel commit: `3248387bbb`]	2023-04-06 17:14:11 -07:00
zjing14	9774bd66ef	add fp64 instances (#658 ) Co-authored-by: root <root@ctr-ubbsmc15.amd.com> [ROCm/composable_kernel commit: `fde6d2742b`]	2023-03-30 13:30:43 -05:00
carlushuang	0755fc355d	simplify karg in device/grid of split-k op (#644 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout [ROCm/composable_kernel commit: `bb5530af91`]	2023-03-29 19:03:07 -05:00
rocking5566	c8d839b5d9	Conv + quantization + tanh (#645 ) * Rename file. Prepare to support another activation * Add comment for quantization * Extract out_elementop * Add tanh example * Add conv + bias + tanh quantization instance * Add missing parameter * Refine cmake * Add external api and client example * Extract variable in example * Fix the comment --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `389e84a83b`]	2023-03-29 14:50:23 -05:00
ltqin	71ce33651f	workaround 637 (#640 ) * add workaround 637 * format * change id --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `6ae12434d2`]	2023-03-20 11:49:31 -05:00
rocking5566	a235ffef27	gemm/Conv xdlops + dlops quantization (#625 ) * Add conv perlayer quantization * Add gemm_dlops quantization * Support int8 for innerproduct * Refine gemm dlops int8 kernel parameter * Support gfx908(MI100) and gfx90a(MI200) * clang-format * Rename example number * Support different layout for d tensor * Add conv dlops perchannel quantization example * Move to example 40 * Extract the common code for different platform (dlops and xdlops) * Move ot subfolder. Prepare to add other op of quantization * Refine the quantization instance library * Add conv dl instances and client example * Remove unnecessary type * Add gemm quantization instance * Add external api and client example * Refine num_bytes * Separete different layout to different cpp * Add more xdl instances * Revert "Remove unnecessary type" This reverts commit `820869182f`. * Remove CShuffleDataType in dlops Let acc and CShuffleDataType be the same in xdlops --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `16dc18e0f9`]	2023-03-15 15:29:40 -05:00
Adam Osewski	50707cbb13	GroupedGEMM + Gelu client example/instances/profiler (#614 ) * Grouped gemm + Gelu instances. * Device Instance Factory for GroupedGemm+Gelu * Client example * Rangify fill helper functions. * Fix name clash. * Profiler for grouped_gemm+gelu * No need to use full namespace name. * Add check for MRaw divisible by vector load. * Ugly fix for big errors. * Add grouped_gemm+gelu to profiler CMakelists. * Store in argument additional info. * Information about Mraw, Nraw, Kraw values. * Use FastGelu instead of Gelu. * Change client ex to use FastGelu * Remove relaxed error precision. * Remove duplicate output elementwise-op --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `9096b1c7b2`]	2023-03-07 22:06:56 -06:00
rocking5566	d5062679f1	Improve normalization (#580 ) * Sync the order of type string with template parameter * Add more instances * Check the vector size and remove redundant var * Extract var to static, prepare to separate sweep once kernel * Separate sweeponce flow and optimize the flow * 1. Rename AccDatatype in normalization to computeData 2. Rename AccElementwiseOperation to YElementwiseOperation in normalization * Remove useless code * Update naive variance kernel * Refine string * Fix typo * Support naive variance for device_normalization * Check the blocksize * Share the VGPR of x and y * Share the VGPR of gamma and beta * Add more instances * Support fp16 sqrt for experiment * Add CHANGELOG * Fix typo * clang-format [ROCm/composable_kernel commit: `6a6163a3d1`]	2023-02-15 11:59:35 -06:00
Adam Osewski	85acf7ac2f	Conv3D FWD BWD WRW fp16 fp32 client examples (#559 ) * Conv3d bwd weight client example. * Update year in license * Convolution bwd data 3D fp16/fp32 client example. * Client example for convnd fwd fp16 fp32 * clang-format * Review remarks. * Fix compiler err. * Update data layout to standard one. * Add conv 3d fwd NDHWGC instances * clang-format * Conv3d fwd NDHWGC instances. --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `e9fd122889`]	2023-02-15 11:16:47 -06:00
Adam Osewski	8d4c822082	GroupedGEMM more bigger tiles. (#577 ) * Adding more bigger tiles. * Remove failing instance. * Remove instances which that don't improve perf. --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `8f42780fd6`]	2023-02-13 10:06:24 -06:00
rocking5566	329678b636	Gemm+layernorm instance, ckProfiler, client example (#568 ) * Add gemm + layernorm instance * Add ckProfiler * Add test * Add client example * Detect if user forger to set the workrspace * Use literal in the example * [What] use builtin function for sqrt [Why] compiler will not use v_sqrt_f64_e64 if we use ::sqrt() * check gemm vaildity in IsSupportedArgument * Add more testcases * Merge duplicated folder in client example * Print more infomation * Use better kernel parameter for MS problem size * clang format * Add constexpr for if condition and remove redundant include * Remove cstdlib and add constexpr [ROCm/composable_kernel commit: `f7d28f3e4b`]	2023-02-09 15:02:55 -06:00
guangzlu	6caec3d429	Add instance for elementwise normlization (#573 ) * added instances for large N * add instance for elementwise normlization * added supported restrict in device_elementwise_normalization_impl.hpp [ROCm/composable_kernel commit: `76d144fa7c`]	2023-02-09 09:37:29 -08:00
ltqin	32525bff35	Add GemmAddSoftmaxGemm support for MSFT ORT (instances and client API) (#576 ) * add instance for gemm bias softmax gemm * add client example * change CGridDesc_G_M_N to CGridDesc_G_M_O * add gridwise * change c grid name * device add d0s data * fix 08 client_example * add example 47_fused_attention * example output correct * add d0 to example * add d0 element op * rechange instance code * change Acc0ElementwiseOperation to C0DEElementwiseOperation * change example name * update instance for cdeelementwiseop * add bhalf_t ScaleAdd * add test * not surport geem1 bias * remove some ignore * fix test bug [ROCm/composable_kernel commit: `332ccc3367`]	2023-02-08 14:34:45 -06:00
Adam Osewski	05befc2690	Add more instances for irregular GEMM sizes. (#560 ) Co-authored-by: Adam Osewski <aosewski@amd.com> [ROCm/composable_kernel commit: `7494c1c611`]	2023-01-26 13:42:20 -06:00
Qianfeng	b148acfaaa	Batchnorm inference instances, external API, client examples and gtests (#531 ) * File renaming and class renaming for device element-wise operation * Add batchnorm-infer instances, external API and client example * Add batchnorm-infer profiler module and gtests * Remove file device_elementwise_extension.hpp and move NormalizeInInfer operation to element_wise_operation.hpp * Remove the using of class aliasing for DeviceElementwiseForBatchNormInfer * Rename class and file due to conflict from device_elementwise_2d.hpp * Fix namespace in batcnnorm_infer_nhwc client example [ROCm/composable_kernel commit: `a1b2441f8d`]	2023-01-25 17:09:04 -06:00
ltqin	ebdb392f09	Add multiD Gemm client APIs (#534 ) * start add example * fix config * fix showinfo bug * add an elementop * change to padding * add xdl example * change elementwiseop * add instance * add instance to profiler * change file name * fix deive not support issue * add client example * fix client gemm_add_multiply name * change AddMultiply elementwiseop * fix elementwiseop * fix client example * fix addmultiply op * fix comments and fun name Co-authored-by: letaoqin <letaoqin@amd.com> [ROCm/composable_kernel commit: `d66421fe34`]	2023-01-18 11:53:56 -06:00
ltqin	bbecb0b509	Add client API/examples for 3xGemm+Bias+Add+Permute{0, 2, 3, 1} (#550 ) * add example * fix example * add instance for gemm permute * add to client example * change configs * change instance file name * formate * change client example file name and remove example [ROCm/composable_kernel commit: `55236709e2`]	2023-01-18 10:52:52 -06:00
Qianfeng	2ca8512f48	Reduction external API and client examples (#493 ) * Change to the DeviceReduce base class template to include all problem description information * Add external api for reduction * Add client example to test the reduction external api * Spelling correction * Re-implement the host_reduction to follow the DeviceReduce base API format * Change the reduce profiler to call the external API for collecting device instances * Rename reduce client example directory from 08_reduce to 12_reduce * Remove (void) before the functional call * Tiny update in reduce client example * Tiny update in profile_reduce_impl.hpp * Rename the reduce client example directory Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `80e0526741`]	2023-01-16 22:18:06 -06:00
zjing14	ac9c43d666	Add MNK padding, M = 0 support into grouped_gemm (#539 ) * add mnk padding, support m=0 * clean code * clean code Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> [ROCm/composable_kernel commit: `0345963eef`]	2022-12-15 15:07:24 -06:00
Rostyslav Geyyer	967b54f3ef	Add padding device_gemm_add_add_fastgelu_xdl_c_shuffle instances to enable arbitrary problem size (#535 ) * Add padding device_gemm_add_add_fastgelu_xdl_c_shuffle instances * Add padding device_gemm_add_fastgelu_xdl_c_shuffle instances * Add gemm_add_fastgelu profiler impl * Add padding device_gemm_fastgelu_xdl_c_shuffle instances * Add gemm_fastgelu profiler impl [ROCm/composable_kernel commit: `9a1f2475e3`]	2022-12-14 18:12:09 -06:00
Rostyslav Geyyer	ebf3f8571d	Add padding device_gemm_xdl instances (#529 ) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `c7a4d36147`]	2022-12-07 17:46:03 -06:00
ltqin	d5bcca1a9f	Add multiple d gridwise gemm on Navi21 for ResNet50 (#517 ) * start add example * add multiple d fp16 example * device transfer elementwiseop to gridwise * gridwise add multiple d * change example for multiple d * fix spill registers * fix for passthrough element op * fix int8 overflow * change example file name * add instance for dl multiple d * example add DsDataType * remove grouped_convolution_forward_dl.hpp * add head file(was deleted before) * fix not support device issue * format * remove passthrough check Co-authored-by: letaoqin <letaoqin@amd.com> [ROCm/composable_kernel commit: `23ecf0fa9e`]	2022-12-02 11:42:31 -06:00
rocking5566	20798a153b	gemm, conv perchannel quantization (#503 ) * Use gemm_multiple_D instead * Add gemm bias relu quantization example * Add pure gemm quantization example * Add quantization of perchannel conv + bias + relu example * Refine the code * Rename multiplier to requant_scale * Rename the folder * Remove redundant comment * Rename the file. Prepare to add perchannel * Add conv perchannel instance * Move to quantization folder * Add conv perchannel client example * Apply Rangify constructor of HostTensorDescriptor & Tensor<> * Fix merge error [ROCm/composable_kernel commit: `ad541ad6b9`]	2022-11-30 14:13:04 -06:00
Qianfeng	0b8096b485	BatchNorm backward instance/external API/profiler/tests (#519 ) * Refine the device batchnorm-backward base API templates and data type assignments * Remove duplicated kernel file * Add batchnorm backward instances and external API * Add batchnorm-backward profiler and tests * Add client example which uses batchnorm backward external API * Merge test/batchnorm_fwd and test/batchnorm_bwd into one directory * Loose the threshold for batchnorm-backward check_err() [ROCm/composable_kernel commit: `63af525c06`]	2022-11-30 13:32:20 -06:00
Qianfeng	b3d1f5f23e	Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test (#516 ) [ROCm/composable_kernel commit: `5bf0475afd`]	2022-11-28 14:33:00 -06:00
Qianfeng	52d082bade	BatchNorm forward instance/external api/profiler/tests/client example (#511 ) * Update to device_batchnorm_forward base class to include all template parameters for problem description * Add batchnorm forward instances and external api * Add batchnorm forward profiler module which uses the external api * Add some comments in batchnorm_forward example to explain the dimensions in lengths[] * Replace the reference_batchnorm_forward_nhwc_c by generic reference_batchnorm_forward * Improvement to the batchnorm infer base API * Add batchnorm forward client example which shows using the batchnorm forward external API * Add test for batchnorm forward * Tuning the batchnorm profiler initialized values and error threshold * Add support for bhalf_t in instances/external api/tests * Add support for int8_t in instances/external api/tests * Add support for double in instances/external api/tests * Let ScaleDataType and BiasDataType be same as XDataType and YDataType when creating instances * Checking before running best instance in batchnorm_fwd_nhwc client example * Add checking for YElementwiseOp in batchnorm_forward external API * Add more types in batchnorm forward profiler * Add more test lengths Co-authored-by: rocking5566 <ChunYu.Lai@amd.com> [ROCm/composable_kernel commit: `4e6a5575be`]	2022-11-24 18:02:27 -06:00
Adam Osewski	2426e0f32b	Client examples AddFastGelu and FastGelu + instances. (#509 ) * FastGelu support for more data types. * AddFastGelu & FastGelu instances. * Client example. * clang-format * Remove unused stride variable. * Add new line at EOF. Co-authored-by: Adam Osewski <aosewski@amd.com> [ROCm/composable_kernel commit: `43a889b72e`]	2022-11-19 22:08:26 -06:00
guangzlu	0015ce6288	Add BF16 tests for batched_gemm_softmax_gemm_permute (#504 ) * fixed bug in softmax reference & add bf16 examples for batched_gemm_scale_softmax_gemm * added bf16 tests for batched_gemm_softmax_gemm_permute * changed format of device_batched_gemm_softmax_gemm_permute_xdl_cshuffle_bf16_bf16_bf16_bf16_gmk_gnk_gno_gmo_instance.cpp * changed format device_batched_gemm_softmax_gemm_permute_xdl_cshuffle_bf16_bf16_bf16_bf16_gmk_gnk_gno_gmo_instance.cpp * aligned annotations * modified CMakeLists for examples * add common example code of fp16/bf16 version for batched_gemm_scale_softmax_gemm_xdl * use macro to control the instances * added macro control into instances * clang-format some files * changed error tolerance for bf16 * changed index for 10_elementwise_normalization * fixed xdlops code bug in amd_xdlops.hpp Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `4c4c7328a6`]	2022-11-15 16:30:23 -06:00
ltqin	32b187963d	Add Conv Backward Data on Navi21 for ResNet50 (#499 ) * start add example * add device dl * change launch kernel * change init data method * change example config * add config valid check * add instance for dl bwd * add instance to ckProfiler * reserver to profiler and cmakelist * add instance to ckProfiler2 * change instance f32 config * fix example return value Co-authored-by: letaoqin <letaoqin@amd.com> Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `db0eb1ea9c`]	2022-11-15 16:22:20 -06:00
Po Yen Chen	93f036f2c3	Add client example of grouped conv2d backward weight (data type: fp16) (#498 ) * Remove redundant CMake setting * Extract common code from files * Rename folder 'convnd' to 'conv' * Use std::array<> to accept compile-time kwnown # of arguments * Fix compilation error of tuning parameter * In example, use same setting as unit-test * Remove no-longer used include directive * Add interface for grouped conv bwd weight * Add group support for conv bwd weight * Add grouped conv bwd weight example * Use group parameter in example * Rename example folder * Remove non-grouped version example source files * Rename device op template * Add group support to convolution backward weight * Remove debug messages * Use smaller group size in example * Use named variable as loop terminate condition * Prettify example output message * Enlarge used grid size * Allow real grid size exceeds expected grid size * Rename interface file * Add client example for grouped conv2d bwd weight * Fix wrong include directive * Rename client example folder [ROCm/composable_kernel commit: `38470e0497`]	2022-11-09 18:50:03 -06:00
Po Yen Chen	db1f435770	Remove interface 'DeviceGroupedConvBwdData' (#500 ) * Remove interface 'DeviceGroupedConvBwdData' * Remove no-longer needed include directive * Rename client example folder [ROCm/composable_kernel commit: `67423a2275`]	2022-11-09 18:32:17 -06:00

1 2 3

127 Commits