composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-11 17:00:18 +00:00

Author	SHA1	Message	Date
rocking	866377de18	MaxPool & AvgPool bwd instances, test, ckProfiler, client example (#861 ) * Add maxpool instances * Rename index pool to max pool. * Add maxpool bwd bf16 instances * Add avg pool bwd instances * Rename avgpool and maxpool to avg_pool3d and max_pool * Add bf16 pool fwd instances * Add max pool bwd to ckProfiler * Add avg pool3d bwd to ckProfiler * Add avg pool bwd test * Fix bug of reference pool fwd (dilation) * Fix bug of max pool bwd (dilation and initZero) * Support bf16 compute data type * Force compute type be f32. Because atomicAdd only support f32 * Add max pool bwd test * Rename folder * Rename pool * Add max pool bwd client example * Add avg pool bwd client example * Add missing workspace * clang format * Rename macro * remove useless header * remove useless layout	2023-08-31 21:01:50 +08:00
zjing14	31ea132aa2	Fp16/fp8 mixed-precision Gemm with multiply+add fusion (#865 ) * add compute_type * add multiply_add ckProfiler * add f8_fp16 support * clean * clean * fixed lds size calc * format --------- Co-authored-by: Jing Zhang <jizha@amd.com>	2023-08-28 16:27:32 -05:00
Jun Liu	c8a8385fdd	[HotFix] add config and version files to pass on build info (#856 ) * experiment with config file * experiment with version.h config * add more info to version.h * minor updates * minor updates * fix case where DTYPE is not used * large amount of files but minor changes * remove white space * minor changes to add more MACROs * fix cmakedefine01 * fix issue with CK internal conflict * fix define and define value * fix clang-format * fix formatting issue * experiment with cmake * clang format v12 to be consistent with miopen * avoid clang-format for config file	2023-08-23 11:36:17 -07:00
zjing14	8ebea3a56e	add generic instances (#858 ) Co-authored-by: Jing Zhang <jizha@amd.com>	2023-08-23 09:18:10 -05:00
zjing14	ca3115e7e8	Ck profiler splitk (#857 ) * updated regular gemm * update ckProfiler * fixed gtests --------- Co-authored-by: Jing Zhang <jizha@amd.com>	2023-08-22 16:54:34 -07:00
Rostyslav Geyyer	eac50708d9	Add instances/ckProfiler/client example for fp8/fp16 mixed precision Gemm (#853 ) * Add ComputeType arg to splitk device and gridwise ops * Update for gridwise op compatibility * Update bf16 and int8 splitk gemm examples with ComputeType * Add instances * Update ckProfiler for mixed precision cases * Add a mixed precision splitK gemm client example --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-08-22 09:34:49 -05:00
rocking	f60f0a5e03	Refactor pool fwd (#815 ) * Do not hardcode stride * devicePool2DFwd Inherit devicePool3DFwd * Move instance declaration out of common * Add dilation * use the pool3d rank, because pool2d inherit pooo3d * calculate Do Ho Wo for the dilation * Fix header name * Modify ckProfiler * Remove pool2d instance * Remove pool2d in profiler * Remove pool2d and add dilation * In to client example, this commit revise following: 1. Add dilation. 2. Use pool3d to implement pool2d * Refine naming and IsSupportedArgument() * Add dilation to maxpool bwd example * clang format * 1. Remove useless header 2. Fix copyright 3. Refine naming * Add layout parameter to pool fwd * clang format * Fix merge error * Fix compile error * Remove layout parameter in derived class * Refine changlog * Fix compile error * Fix compiler error * Add layout to external api and profiler	2023-08-15 02:25:28 +08:00
rocking	03b8119e2e	Add Normalization splitk instances (#829 ) * Add normalization splitK to layernorm and groupnorm instances * Fix bug of GetKPerThread() * Refine naming * clang format	2023-08-12 01:31:31 +08:00
Illia Silin	08eb176929	Allow building CK for specific data types and split off last remaining DL instances. (#830 ) * properly split conv_nd_bwd_data instances * split conv2d_fwd instance data types * split the gemm, conv2d_fwd and batched_gemm_softamx_gemm * split the tests by data types where possible * filter examples by DTYPES * split few remaining examples by DTYPES * filter most instances by DTYPES * add new lines at end of headers, fix grouped_gemm profiler * fix syntax * split the ckprofiler instances by DTYPES * split the conv2d and quantization DL and XDL instances * fix the splitting of conv2d DL instances * split softmax and pool_fwd tests for fp16 and fp32 types * fix syntax * fix the dl_int8 quantization instances isolation	2023-08-07 14:56:10 -07:00
Bartłomiej Kocot	22443f7aae	Add wei_strides to grouped conv3d wei to keep consistency (#817 ) * Add wei_strides to grouped conv3d wei to keep consistency * Fix strides in client examples * Unify backward weight api with forward * Fix for example * Fixes for examples --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-08-07 10:23:45 -05:00
Po Yen Chen	f7cc8c3b03	Update tuning parameter & compilation options of DeviceGemmXdl<> instance (layout=TT) (#819 ) * Enable pipeline v2 opt for layout=TT instance * Use better thread mapping for reading A tile * Conditionally enable pipeline v2 opt * Allow enabling only fp16 gemm instances in profiler * Fix formatting error * Fix compilation error if we enable fp32 in profiler	2023-08-02 10:32:22 -05:00
carlushuang	e7dca79d27	initial stream-k implementation with example (#699 ) * initial stream-k implementation with example * fix unexpected change in err * improve a little bit performance by reorganize pipeline. * improve perf a little bit by swizzle block idx * add profiler * update example * fix spelling * shrink karg for streamk * support dynamic buffer using memory coherence glc_slc bit from template * control memory coherence while construct dynamic buffer * update reduction for streamk(not ready yet) * Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting * fix build issue * fix several bug * now result is correct, everything works (but has scratch) * remove scratch by manually reset coordinate * update device code * fix a bug in final reduce * fix something in example * update async memset * fix enum as camel case * modify coherence enum name * clean code and use atomic streamk by default * remove unused var * throw exception if have empty pointer * fix format * fix CI warning * fix type in init * modify CI error * filter out on gfx10+ * restore changed example code --------- Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>	2023-07-26 14:18:15 -05:00
Illia Silin	9195435c77	Disable DL kernels by default. (#816 )	2023-07-26 11:06:45 -05:00
Bartłomiej Kocot	10732847e7	Grouped conv bwd wei NDHWGC/NDHWGK (#804 )	2023-07-21 12:00:55 -05:00
Bartłomiej Kocot	49180fd60b	Grouped 3d conv backward data support (#799 ) * Grouped 3d conv backward data support * Fix comments	2023-07-18 11:01:33 -05:00
Illia Silin	189ea3b9aa	Add mechanism to build CK for select data types, add Navi3x CI. (#790 ) * allow building CK for specific data types * add CI build and test stage on Naiv3x without some int8 instances * add missing gemm fp16 instances * add the changes to the missed cmake file * add empty lines at end of source files * Do not build quantization client example on navi3 in CI * disable batched_gemm_multi_d_int8 instances with DTYPES * disable device_conv2d_bwd_data_instance with DTYPES * fix ckprofiler for conv_bwd_data for int8 * properly isolate the conv_bwd_data int8 instances * remove empty line	2023-07-17 18:02:42 -07:00
Bartłomiej Kocot	1ee99dcaa6	Support NHWGC conv2d_bwd_weight (#769 ) * Support NHWGC conv2d_bwd_weight * Fix client example * Fix client example * Fix comments * Redesign grouped_conv_bwd_weight instances * Clang format fix --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-07-12 08:25:02 -05:00
Qianfeng	8f5cafaf04	Batchnorm splitk single kernel (#771 ) * Use dim 0 as faster dim for writing mean/var/count workspace in batchnorm multiblock method [performance] * Add CountDataType as template parameter in blockwise_welford * Add utility/get_shift.hpp * Add BatchNorm multiblock single-kernel implementation * Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a * Renaming in device_batchnorm_forward_impl.hpp * Tiny fix in the batchnorm_fwd profiler * Revert "Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a" This reverts commit `d16d00919c`. * Use the old two-kernel batchnorm multiblock method for gfx1030 * Use the old two-kernel batchnorm multiblock method for gfx908 * use the single-kernel batchnorm multiblock method only for gfx90a * Remove get_wave_id() from utility/get_id.hpp since it is not used * Set true for testing running mean/variance and saving mean/invvariance in the examples * Fix to copy-right words * Remove un-needed including in utility/get_id.hpp * Add comments to workgroup_synchronization.hpp * Remove un-used codes in gridwise_multiblock_batchnorm_forward.hpp * Renaming in the kernels * Remove un-used kernel file	2023-07-06 10:58:55 -05:00
Bartłomiej Kocot	63388e84ab	Support bf16/f32/f16 and NHWGC conv2d_bwd_data (#757 ) * Support bf16/f32/f16 and NHWGC conv2d_bwd_data * Add interface test * clang format * Comment fixes * Add more friendly error message	2023-06-21 08:20:31 -05:00
Qianfeng	0d9118226b	Padded Generic Kernel Instance (#730 ) * Add NumReduceDim template parameter to DeviceSoftmax and Softmax client API to simplify instances collecting * Move the generic kernel instance to be the first of the instance list for elementwise op of normalization * Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax * Add testing of GetGenericInstance() in client_example of Softmax * Revert "Add testing of GetGenericInstance() in client_example of Softmax" This reverts commit `f629cd9a93`. * Revert "Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax" This reverts commit `a9f0d000eb`. * Support generic kernel instance to be the first instance returned by GetInstances() for GroupNorm * Move generic kernel instance to separate tuple for elementwise op of normalization * Remove un-used files for softmax instance * Store generic kernel instance to separate tuple for softmax * Add IsSupported checking for generic instance to client example of softmax * Replace the get_device_normalize_from_mean_meansquare_instances() by the DeviceOperationInstanceFactory class for elementwise-normalization * clang-format fix * Remove int8 from softmax instances --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-06-16 23:43:11 -05:00
Bartłomiej Kocot	fc9f97568f	Add DeviceBatchedGemmMultipleD_Dl (#732 ) * Add DeviceBatchedGemmMultipleD_Dl * Fix batched_gemm tests * Fix comments * test_batched_gemm_multi_d fixes * Fix args for isSupported batchedGemmMultipleDDl * Disable tests for gfx90a	2023-06-12 08:37:15 -05:00
Illia Silin	b94fd0b227	update copyright headers (#726 )	2023-05-31 18:46:57 -05:00
Adam Osewski	70e4eb567f	Multiple fixes to GroupedGemm+SplitK (#707 ) * Add license header. * Reduce number of logged output. Add constant initialization. * Add functional tests for grouped_gemm with different kbatch value. * Add debug log informations + remove unused code. * Don't pass kbatch to CalculateKPadded. * Turn on logging in grouped gemm and gemm splitk profiler * Debug: limit number of test cases to run; * Log more information and initialize with constant value. * Turn on DEBUG_LOG * Add more debug log informations. * Limit the number of instances to compile. * Use GridwiseGemmPipeline * Use KBatch to calculate K0 * Multiple DebugLog messages. * Unit tests for multiple KBatch values. * Refactoring * Disable logging * extract out of if statement KBatch update. * Uncomment instances. * Disable DebugLog. * Use Kbatch when calculate KPadded. * Fix CGridDesc padding. * Use available helper functions. * Uncomment code commented for debuggin. * Remove unnecessary debug log messages. * Uncomment previously commented code for debug purposes. * Add KBatch info to profiler output summary log. * Add gtests for gemm splitk using ckProfiler API. * Add more test-cases for different data layout. * Add more test cases for gemm splitk * Remove old test. * Unit tests for MKNK ggemm interface. * Fix and add more unit-tests. * Constepxr everything! * Increase error threshold for fp16 and splitk. Since we're using fp16 atomic add for splitk there's a known precision loss. --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-05-30 07:09:06 -05:00
Illia Silin	ac9e01e2cc	Clean-up the headers (#713 ) * fix headers for gpu instances * remove unused headers --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-05-24 08:11:25 -07:00
rocking	76ec0089fb	Pool3d fwd (#697 ) * Expand the base class of pool2d, prepare to share base class with pool3d * Add pool3d device op * Add pool3d f16 example * Refactor the base class. implement generic pooling in the future * clang format * get original index in max pooling * Add outputindex to base class * Fix dimension * Add pooling instance * Use indexType instead * Remove useless header * Extract IndexDataType to template * Extract pooling reference code * clang format * clang format * Fix typo * Add tensor stride * Add missing header * Add index stride and output stride * Refine naming * Add type to base class * Rename file * Use proper size * Fix typo * Refine naming * Modify the argument into vector. * Add max pool profiler * Refine naming * Support f32 pool * Fix typo * Add avg pool2d fwd in profiler * clang format * Rename AccDatatype to ComputeDatatype * Fix init * test pool * Extract variable * Add client example * Check the pooling dim * clang format * Connect argv and arg_parser * Add found check * Remove useless header * Refine naming * Adjust the order of device_pool_fwd	2023-05-24 09:05:04 -05:00
Bartłomiej Kocot	642d5e9155	Add contraction profiler and tests (#701 ) * Add contraction profiler and tests * Build and style fixes * Allow to use any elementwise operator for ref_contraction * Introduce profile_contraction_scale and profile_contraction_bilinear * Make ref_contraction generic and extend interface tests * Stylistic minor fixes * Extend test_contraction_interface	2023-05-15 09:46:52 -05:00
zjing14	f53ede26e5	fixed init range (#691 )	2023-05-02 08:30:23 -07:00
Adam Osewski	8bb2bb4a05	Grouped Gemm + SplitK + simplified Kernel Args (#669 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * B2C with 3D grid for KSplit * Remove unused code. * Use default B2C (3D grid) in grid gemm v2r4r2. * Device gemm splitk use B2C map. * Device GroupedGemmXdlSplitKCShuffle * Example for GroupedGemm Xdl SplitK * Introduce Device GroupedGemmSplitK * Fix updating kbatch size. * Add instance mk-nk-mn * Enable set kbatch in profiler. * Add GGemmSplitK mk-kn-mn instances * Add more instances & split into multiple files. * minor fix * tuning * clean * disabled failed instances * use pipe v2 * Ignore arg on not supported arch. * fix warning --------- Co-authored-by: carlushuang <carlus.huang@amd.com> Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Jing Zhang <jizhan@amd.com> Co-authored-by: root <root@ctr-ubbsmc15.amd.com>	2023-04-24 15:43:36 -05:00
zjing14	8b9cbba823	reduce inital number for half_t splitk (#685 )	2023-04-24 08:07:39 -05:00
rocking5566	ed3a2e5226	Groupnorm + swish external api (#668 ) * Rename to proper naming * Add example of groupnorm + swish * Extract duplicate code in example * Add groupnorm + swish instances * Ractor instance generation, split into multiple cpp file * Add external api and client example * Refine profiler message * Use ck math version of exp * Refine problem size in example * Add host version of exp	2023-04-10 08:02:17 -05:00
Adam Osewski	9096b1c7b2	GroupedGEMM + Gelu client example/instances/profiler (#614 ) * Grouped gemm + Gelu instances. * Device Instance Factory for GroupedGemm+Gelu * Client example * Rangify fill helper functions. * Fix name clash. * Profiler for grouped_gemm+gelu * No need to use full namespace name. * Add check for MRaw divisible by vector load. * Ugly fix for big errors. * Add grouped_gemm+gelu to profiler CMakelists. * Store in argument additional info. * Information about Mraw, Nraw, Kraw values. * Use FastGelu instead of Gelu. * Change client ex to use FastGelu * Remove relaxed error precision. * Remove duplicate output elementwise-op --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-07 22:06:56 -06:00
rocking5566	6a6163a3d1	Improve normalization (#580 ) * Sync the order of type string with template parameter * Add more instances * Check the vector size and remove redundant var * Extract var to static, prepare to separate sweep once kernel * Separate sweeponce flow and optimize the flow * 1. Rename AccDatatype in normalization to computeData 2. Rename AccElementwiseOperation to YElementwiseOperation in normalization * Remove useless code * Update naive variance kernel * Refine string * Fix typo * Support naive variance for device_normalization * Check the blocksize * Share the VGPR of x and y * Share the VGPR of gamma and beta * Add more instances * Support fp16 sqrt for experiment * Add CHANGELOG * Fix typo * clang-format	2023-02-15 11:59:35 -06:00
rocking5566	f7d28f3e4b	Gemm+layernorm instance, ckProfiler, client example (#568 ) * Add gemm + layernorm instance * Add ckProfiler * Add test * Add client example * Detect if user forger to set the workrspace * Use literal in the example * [What] use builtin function for sqrt [Why] compiler will not use v_sqrt_f64_e64 if we use ::sqrt() * check gemm vaildity in IsSupportedArgument * Add more testcases * Merge duplicated folder in client example * Print more infomation * Use better kernel parameter for MS problem size * clang format * Add constexpr for if condition and remove redundant include * Remove cstdlib and add constexpr	2023-02-09 15:02:55 -06:00
ltqin	332ccc3367	Add GemmAddSoftmaxGemm support for MSFT ORT (instances and client API) (#576 ) * add instance for gemm bias softmax gemm * add client example * change CGridDesc_G_M_N to CGridDesc_G_M_O * add gridwise * change c grid name * device add d0s data * fix 08 client_example * add example 47_fused_attention * example output correct * add d0 to example * add d0 element op * rechange instance code * change Acc0ElementwiseOperation to C0DEElementwiseOperation * change example name * update instance for cdeelementwiseop * add bhalf_t ScaleAdd * add test * not surport geem1 bias * remove some ignore * fix test bug	2023-02-08 14:34:45 -06:00
Qianfeng	a1b2441f8d	Batchnorm inference instances, external API, client examples and gtests (#531 ) * File renaming and class renaming for device element-wise operation * Add batchnorm-infer instances, external API and client example * Add batchnorm-infer profiler module and gtests * Remove file device_elementwise_extension.hpp and move NormalizeInInfer operation to element_wise_operation.hpp * Remove the using of class aliasing for DeviceElementwiseForBatchNormInfer * Rename class and file due to conflict from device_elementwise_2d.hpp * Fix namespace in batcnnorm_infer_nhwc client example	2023-01-25 17:09:04 -06:00
Qianfeng	52abc2f371	Use double for all scaling values and float-point constant values at the Device Op API (#557 ) * Use double as alpha/beta values type in reduce device op api * Use double as alpha/beta values type in softmax device op api * Use double as alpha/beta values type in multiple-reduce device op api * Use double as epsilon value type in normalization/elementwise-normalization device op api	2023-01-18 12:02:50 -06:00
ltqin	d66421fe34	Add multiD Gemm client APIs (#534 ) * start add example * fix config * fix showinfo bug * add an elementop * change to padding * add xdl example * change elementwiseop * add instance * add instance to profiler * change file name * fix deive not support issue * add client example * fix client gemm_add_multiply name * change AddMultiply elementwiseop * fix elementwiseop * fix client example * fix addmultiply op * fix comments and fun name Co-authored-by: letaoqin <letaoqin@amd.com>	2023-01-18 11:53:56 -06:00
Qianfeng	80e0526741	Reduction external API and client examples (#493 ) * Change to the DeviceReduce base class template to include all problem description information * Add external api for reduction * Add client example to test the reduction external api * Spelling correction * Re-implement the host_reduction to follow the DeviceReduce base API format * Change the reduce profiler to call the external API for collecting device instances * Rename reduce client example directory from 08_reduce to 12_reduce * Remove (void) before the functional call * Tiny update in reduce client example * Tiny update in profile_reduce_impl.hpp * Rename the reduce client example directory Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>	2023-01-16 22:18:06 -06:00
Rostyslav Geyyer	9a1f2475e3	Add padding device_gemm_add_add_fastgelu_xdl_c_shuffle instances to enable arbitrary problem size (#535 ) * Add padding device_gemm_add_add_fastgelu_xdl_c_shuffle instances * Add padding device_gemm_add_fastgelu_xdl_c_shuffle instances * Add gemm_add_fastgelu profiler impl * Add padding device_gemm_fastgelu_xdl_c_shuffle instances * Add gemm_fastgelu profiler impl	2022-12-14 18:12:09 -06:00
Anthony Chang	d156709432	Fix bug where scaling may not be applied in some code path (#526 ) * fix bug where scaling may not be applied in some code path * more test * revert accidental example code changes	2022-12-02 11:43:34 -06:00
ltqin	23ecf0fa9e	Add multiple d gridwise gemm on Navi21 for ResNet50 (#517 ) * start add example * add multiple d fp16 example * device transfer elementwiseop to gridwise * gridwise add multiple d * change example for multiple d * fix spill registers * fix for passthrough element op * fix int8 overflow * change example file name * add instance for dl multiple d * example add DsDataType * remove grouped_convolution_forward_dl.hpp * add head file(was deleted before) * fix not support device issue * format * remove passthrough check Co-authored-by: letaoqin <letaoqin@amd.com>	2022-12-02 11:42:31 -06:00
Po Yen Chen	8784a72e23	Modularize ckProfiler operations (#514 ) * Re-structure ckProfiler source files * Rename profiler.cpp to main.cpp * Modularize ckProfiler operations * Add description for profiler operations * Use longer name to avoid name collision * Use macro to delay expansion * Use std::move() to avoid object copying * Prohibit users from calling dtor * Use macro to eliminate redundant code * Make friend function hidden * Add missing include directive <iostream> * Fix wrong include directives * Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>	2022-12-01 15:15:02 -06:00
Qianfeng	63af525c06	BatchNorm backward instance/external API/profiler/tests (#519 ) * Refine the device batchnorm-backward base API templates and data type assignments * Remove duplicated kernel file * Add batchnorm backward instances and external API * Add batchnorm-backward profiler and tests * Add client example which uses batchnorm backward external API * Merge test/batchnorm_fwd and test/batchnorm_bwd into one directory * Loose the threshold for batchnorm-backward check_err()	2022-11-30 13:32:20 -06:00
Qianfeng	5bf0475afd	Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test (#516 )	2022-11-28 14:33:00 -06:00
Qianfeng	4e6a5575be	BatchNorm forward instance/external api/profiler/tests/client example (#511 ) * Update to device_batchnorm_forward base class to include all template parameters for problem description * Add batchnorm forward instances and external api * Add batchnorm forward profiler module which uses the external api * Add some comments in batchnorm_forward example to explain the dimensions in lengths[] * Replace the reference_batchnorm_forward_nhwc_c by generic reference_batchnorm_forward * Improvement to the batchnorm infer base API * Add batchnorm forward client example which shows using the batchnorm forward external API * Add test for batchnorm forward * Tuning the batchnorm profiler initialized values and error threshold * Add support for bhalf_t in instances/external api/tests * Add support for int8_t in instances/external api/tests * Add support for double in instances/external api/tests * Let ScaleDataType and BiasDataType be same as XDataType and YDataType when creating instances * Checking before running best instance in batchnorm_fwd_nhwc client example * Add checking for YElementwiseOp in batchnorm_forward external API * Add more types in batchnorm forward profiler * Add more test lengths Co-authored-by: rocking5566 <ChunYu.Lai@amd.com>	2022-11-24 18:02:27 -06:00
guangzlu	4c4c7328a6	Add BF16 tests for batched_gemm_softmax_gemm_permute (#504 ) * fixed bug in softmax reference & add bf16 examples for batched_gemm_scale_softmax_gemm * added bf16 tests for batched_gemm_softmax_gemm_permute * changed format of device_batched_gemm_softmax_gemm_permute_xdl_cshuffle_bf16_bf16_bf16_bf16_gmk_gnk_gno_gmo_instance.cpp * changed format device_batched_gemm_softmax_gemm_permute_xdl_cshuffle_bf16_bf16_bf16_bf16_gmk_gnk_gno_gmo_instance.cpp * aligned annotations * modified CMakeLists for examples * add common example code of fp16/bf16 version for batched_gemm_scale_softmax_gemm_xdl * use macro to control the instances * added macro control into instances * clang-format some files * changed error tolerance for bf16 * changed index for 10_elementwise_normalization * fixed xdlops code bug in amd_xdlops.hpp Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>	2022-11-15 16:30:23 -06:00
Po Yen Chen	dc663fae29	Rangify STL algorithms (#438 ) * Rangify STL algorithms This commit adapts rangified std::copy(), std::fill() & std::transform() * Re-write more std::copy() calls * Re-write std::copy() calls in profiler	2022-11-14 15:17:28 -06:00
Po Yen Chen	4a2a56c22f	Rangify constructor of HostTensorDescriptor & Tensor<> (#445 ) * Rangify STL algorithms This commit adapts rangified std::copy(), std::fill() & std::transform() * Rangify check_err() By rangifying check_err(), we can not only compare values between std::vector<>s, but also compare any ranges which have same value type. * Allow constructing Tensor<> like a HostTensorDescriptor * Simplify Tensor<> object construction logics * Remove more unnecessary 'HostTensorDescriptor' objects * Re-format example code * Re-write more HostTensorDescriptor ctor call	2022-11-11 11:36:01 -06:00
Lauren Wrubleski	37f2e91832	Add packages for examples and profiler (#502 ) * Add packages for example and profiler * correct TEST_NAME -> EXAMPLE_NAME	2022-11-10 13:19:33 -06:00
Po Yen Chen	f49803101e	Add client example of grouped conv2d forward (data type: fp16) (#488 ) * Rename example folder for GroupedConvFwdMultipleD * Unify example codes * Change target names * Add fp16 example for multiple d instance * Re-format common.hpp * Add interface 'DeviceGroupedConvFwd' * Use simpler interface * Move common conv params out * Rename conv fwd client example folder * Add missing include directive * Update grouped conv instance implementations * Simplify ckProfiler (grouped conv forward) * Use GroupedConvFwd to implement client example * Use greater groupe count in example * Add custom target to group examples * Add extra tag param to instance factory function * Use tag to differentiate factory functions * Add missing tag argument for factory function * Remove inheritance relationship * Remove no-longer used include directive * Add license in front of file	2022-11-09 19:01:58 -06:00

1 2 3

146 Commits