composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-13 10:37:42 +00:00

Author	SHA1	Message	Date
who who who	2b3fd10f2b	remove unused variable (#564 ) * remove unused variable * format code [ROCm/composable_kernel commit: `ba40c2ce9d`]	2023-01-31 10:34:35 +08:00
Adam Osewski	dcc84da1cf	Use defined seed for deterministic test runs. (#562 ) Co-authored-by: Adam Osewski <aosewski@amd.com> [ROCm/composable_kernel commit: `274108d6e6`]	2023-01-30 13:03:59 -06:00
Adam Osewski	a203d3db7b	Add more instances for irregular GEMM sizes. (#560 ) Co-authored-by: Adam Osewski <aosewski@amd.com> [ROCm/composable_kernel commit: `7494c1c611`]	2023-01-26 13:42:20 -06:00
Qianfeng	2c1a324b99	Batchnorm inference instances, external API, client examples and gtests (#531 ) * File renaming and class renaming for device element-wise operation * Add batchnorm-infer instances, external API and client example * Add batchnorm-infer profiler module and gtests * Remove file device_elementwise_extension.hpp and move NormalizeInInfer operation to element_wise_operation.hpp * Remove the using of class aliasing for DeviceElementwiseForBatchNormInfer * Rename class and file due to conflict from device_elementwise_2d.hpp * Fix namespace in batcnnorm_infer_nhwc client example [ROCm/composable_kernel commit: `a1b2441f8d`]	2023-01-25 17:09:04 -06:00
Qianfeng	fc8fa0992f	Use double for all scaling values and float-point constant values at the Device Op API (#557 ) * Use double as alpha/beta values type in reduce device op api * Use double as alpha/beta values type in softmax device op api * Use double as alpha/beta values type in multiple-reduce device op api * Use double as epsilon value type in normalization/elementwise-normalization device op api [ROCm/composable_kernel commit: `52abc2f371`]	2023-01-18 12:02:50 -06:00
Raman R jana	21a146fb2f	Wavelet (inter-wave consumer-producer) GEMM (#310 ) * wavelet gemm programming model support for CK * GEMM pipeline update for wavelet progrmmaing model * Updated wavelet programming pipeline * fixes for global-write for math-wave * fixed bug in global writes * Updated comments for better readability * fixed clang format errors * added block_lds without barrier sync * clean * clean * clean * clean * refactor * prototype 4 layouts fix default stride all problem sizes tidy move file update build script restore old file fix build * refactor standalone test to use gemm test harness * simplify gemm test * update build script * remove redundant * early return when cmd arg doesn't match * tidy * report failure when result not validated * tidy * Add comment depicting B2C mapping pattern. * Formatting & comments. * Comparison with custom B2C mapping pattern. * Example for wavelet gemm. * Add wavelet to Gemm standalone test. * Remove debug code. * Remove dangling #endif directive. Co-authored-by: root <Raman Jana> Co-authored-by: Chao Liu <chao.liu2@amd.com> Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: Anthony Chang <ac.chang@outlook.com> Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> [ROCm/composable_kernel commit: `1cfa87608a`]	2023-01-18 12:00:02 -06:00
ltqin	c87d5b6832	Add multiD Gemm client APIs (#534 ) * start add example * fix config * fix showinfo bug * add an elementop * change to padding * add xdl example * change elementwiseop * add instance * add instance to profiler * change file name * fix deive not support issue * add client example * fix client gemm_add_multiply name * change AddMultiply elementwiseop * fix elementwiseop * fix client example * fix addmultiply op * fix comments and fun name Co-authored-by: letaoqin <letaoqin@amd.com> [ROCm/composable_kernel commit: `d66421fe34`]	2023-01-18 11:53:56 -06:00
Illia Silin	b0c9e3340b	fix a bug for 6-dim kernels (#555 ) [ROCm/composable_kernel commit: `00ff30af8c`]	2023-01-18 11:44:11 -06:00
who who who	2484caa010	add multi embeddings support (#542 ) * add multi embeddings support * fix format * optimize sqrt * add reduce operation * change to elementwise op * fix name * rename * run ci cd * format example * format code * format code [ROCm/composable_kernel commit: `147b7db561`]	2023-01-18 11:32:12 -06:00
ltqin	26767954fd	Add client API/examples for 3xGemm+Bias+Add+Permute{0, 2, 3, 1} (#550 ) * add example * fix example * add instance for gemm permute * add to client example * change configs * change instance file name * formate * change client example file name and remove example [ROCm/composable_kernel commit: `55236709e2`]	2023-01-18 10:52:52 -06:00
Qianfeng	46a0aceec1	Reduction external API and client examples (#493 ) * Change to the DeviceReduce base class template to include all problem description information * Add external api for reduction * Add client example to test the reduction external api * Spelling correction * Re-implement the host_reduction to follow the DeviceReduce base API format * Change the reduce profiler to call the external API for collecting device instances * Rename reduce client example directory from 08_reduce to 12_reduce * Remove (void) before the functional call * Tiny update in reduce client example * Tiny update in profile_reduce_impl.hpp * Rename the reduce client example directory Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `80e0526741`]	2023-01-16 22:18:06 -06:00
rocking5566	8d6f3a2b81	Gemm layernorm welford (#413 ) * Add device op of gemm layernorm * [What] Rename F to H [Why] F and G prepare for welford tensor * Add gridwise gemm + welford * Extract template parameter * Rename kernel. Prepare to add second half kernel * Extract var * Add second kernel for gemm+layernorm * Move to the gemm_layernorm folder * Rename F and G to mean and var * Do not use snakeCurved, it makes determination of padding for welford difficult * Rewrite the device interface and rename some var * Add welford count * Update interface * Sync code, prepare to test on MI200 * Clean the code * Implement layernorm * Add comment to mension hipFree * Wrtie out the e for debug. This could be remove and use h for instead * 1. Allocate mean, var and count into by SetWorkSpacePointer. 2. Add GetWorkSpaceSize to calculate the space size * Add gemm layernorm host code * use reference layernorm * Fix bug of blockwise welford for first kernel * Fix bug of mean var padding for layernorm * Use sgpr for shuffleM_index * padding for GemmMeanVarCountGridDescriptor_M_NBlock * Add layout parameter * Check argument for gemm * calculate max count for tail block * Share E and H memory in device op * Hard code the vector dim * Refine the MakeDescriptor * 1. Remove E parameter, because E is inside of device op 2. Check vector size * [What] Rename MakeMeanVarDescriptor_M_N [Why] Prepare to add count version of make descriptor * Use 1D global memory for count * Prevent redundant IO * Update parameter * Add pipeline v1/v2 selector * Rename the example name * Add base class for gemm layernorm * Refine naming to distinguish naive and welford * Add comment to explan in detail * We don't need to pad in N dimension in gemm for mean/var/count. Set NPerTile 1 * Rewrite the 2st kernel, use multiple block along N dimension in layernorm kernel * Share the vector size * Refine var name * [What] Force LayernormThreadSliceSize_N = vector size. [Why] Memory coalesce * Add comment * Extract divisor out of the loop in reference layernorm * Pad different size for E and H in layernorm kernel according to different block tile * Refine naming * Refine naming * Prevent implicit cast * [What] use ck::math::sqrt instead of __builtin_amdgcn_sqrtf [Why] __builtin_amdgcn_sqrtf is only support float, double will cause casting * Cast only constant * Change of post shuffle thread descriptor * Add EMeanVarDataType parameter. * Merge the mean and var threadwise copy * Add missing index * Fix Typo * Sync the variable with previous if * 1. Declare e inside the host_gemm_layernorm() 2. Prevent implicit cast in reference code Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `7829d729fb`]	2023-01-16 20:08:25 -06:00
Haocong WANG	9d5e41b586	[Navi3x-LWPCK-545] Block-wise GEMM + Real GEMM_WMMA_FP16 (#541 ) * wmma_op + unit test * add arch limitation to wmma test * change arch limitation * Refactor + Add all type unit test(int4 compile failed) * Add f32_16x16x16_bf16 unit test * tempsave * tempsave * tempsave * runtime bug, cannot find symbol * workaround for incorrect HIP warpSize return value * debugging * tempsave * Correctness OK, waiting for optimization * Tidy up + format * temp save * temp save, reproduce the v_bfi_b32 issue * add inline asm for wmmaop test * tidy up * clean some debug purpose code * discard some codes * clang format * clang format * compiler issue fixed + increase tile size [ROCm/composable_kernel commit: `919aeb1f52`]	2023-01-16 20:06:01 -06:00
Illia Silin	dbc281041a	Add a flag to enable/disable debug output in many kernels. (#549 ) * add DEBUG_LOG macro to enable/disable debug output * fix syntax * fix syntax again * fix syntax one more time * remove balnk spaces * use ifdefs * add the Print argument * move the definition of DEBUG_LOG to ck.hpp * add the missign argument to Print() [ROCm/composable_kernel commit: `715e8dd241`]	2023-01-11 19:55:56 -06:00
Qianfeng	be8d157e6d	Remove including of cmath (#551 ) * Let cmath included when compiling host codes in math_v2.hpp * Remove including of cmath in device_base.hpp and device_permute.hpp [ROCm/composable_kernel commit: `a17b041486`]	2023-01-11 19:52:47 -06:00
zjing14	afa7c8eea1	Add MNK padding, M = 0 support into grouped_gemm (#539 ) * add mnk padding, support m=0 * clean code * clean code Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> [ROCm/composable_kernel commit: `0345963eef`]	2022-12-15 15:07:24 -06:00
Illia Silin	b0806dacbd	disable the attention test that fails on MI100 (#540 ) [ROCm/composable_kernel commit: `1115117503`]	2022-12-15 10:20:21 -06:00
Qianfeng	990e8b78d2	Add interface GetTypeIdName() and GetTypeIdHashCode() for Device Op (#533 ) [ROCm/composable_kernel commit: `10c72aced8`]	2022-12-14 18:34:02 -06:00
Rostyslav Geyyer	9cdd223de5	Add padding device_gemm_add_add_fastgelu_xdl_c_shuffle instances to enable arbitrary problem size (#535 ) * Add padding device_gemm_add_add_fastgelu_xdl_c_shuffle instances * Add padding device_gemm_add_fastgelu_xdl_c_shuffle instances * Add gemm_add_fastgelu profiler impl * Add padding device_gemm_fastgelu_xdl_c_shuffle instances * Add gemm_fastgelu profiler impl [ROCm/composable_kernel commit: `9a1f2475e3`]	2022-12-14 18:12:09 -06:00
Rostyslav Geyyer	46be71bf44	Add a docker hub doc file (#538 ) [ROCm/composable_kernel commit: `74744cab3e`]	2022-12-14 12:17:28 -08:00
arai713	2e92e52137	Gridwise elementwise 2d (#466 ) * added 2d gridwise elementwise * added 2d version of device elementwise * added example file with updated device elementwise call * added Cmake file * changed NumDim into 2D * fixed compiler issues * fixed indexing for loop step * fixed NumDim dimension error * changed blockID to 2D * updated Grid Desc * updated kernel call * fixed 2d thread indexing * added dimensions for example file * commented out unused code * changed vector load * removed extra code * temporarily removing vector load on 2nd dim * changed vector load back, still causing errors * altered indexing * changed isSupportedArgument for 2D * changed indexing + do/while * fixed isSupportedArgument * changed dimension for debugging * fixed * added testing printouts * testing change * added variables to distribute threads through both dimensions * testing changes * integrated variable for thread distribution into device elementwise and added as parameter for gridwise elementwise * removed most of the extraneous code, testing with different dimensions * testing * removed debugging print statements * moved 2d elementwise permute into elementwise permute directory * fixed formatting * removed debugging comments from threadwise transfer Co-authored-by: Jing Zhang <jizhan@amd.com> Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `0e5c264c3e`]	2022-12-12 09:18:10 -06:00
Illia Silin	79c4b5d928	Make sure that GEMM sizes in K dimension are supported. (#527 ) * apply new K-dimension check in gemm_xdl_cshuffle * add K-dim check to gemm_xdl and batched_gemm_xdl * fix syntax * fix syntax * clean-up the debug output [ROCm/composable_kernel commit: `d58b7f5155`]	2022-12-08 11:48:43 -06:00
Po Yen Chen	e25360c38f	Fix Grouped ConvBwdWeight test case failure (#524 ) * Use smaller tensor size in test * Use even more smaller tensor size * Touch only failing test case inputs [ROCm/composable_kernel commit: `614a7b1bb0`]	2022-12-07 17:46:28 -06:00
Rostyslav Geyyer	7355f95afe	Add padding device_gemm_xdl instances (#529 ) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `c7a4d36147`]	2022-12-07 17:46:03 -06:00
guangzlu	db0a1032b1	modified half function in math_v2.hpp (#528 ) Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `ce87b4f765`]	2022-12-07 17:43:02 -06:00
Illia Silin	cbc6b1c823	Fix CI error. (#530 ) * ignore .git folder when doing clang-format * fix syntax * add backslashes before quotes * add path filter for several extensions [ROCm/composable_kernel commit: `d072790fe2`]	2022-12-06 15:09:51 -06:00
Anthony Chang	96c07fc27d	Fix bug where scaling may not be applied in some code path (#526 ) * fix bug where scaling may not be applied in some code path * more test * revert accidental example code changes [ROCm/composable_kernel commit: `d156709432`]	2022-12-02 11:43:34 -06:00
ltqin	621c12302f	Add multiple d gridwise gemm on Navi21 for ResNet50 (#517 ) * start add example * add multiple d fp16 example * device transfer elementwiseop to gridwise * gridwise add multiple d * change example for multiple d * fix spill registers * fix for passthrough element op * fix int8 overflow * change example file name * add instance for dl multiple d * example add DsDataType * remove grouped_convolution_forward_dl.hpp * add head file(was deleted before) * fix not support device issue * format * remove passthrough check Co-authored-by: letaoqin <letaoqin@amd.com> [ROCm/composable_kernel commit: `23ecf0fa9e`]	2022-12-02 11:42:31 -06:00
Haocong WANG	3baad464d0	[Navi3x-LWPCK-449] wmma_op + unit test (#484 ) * wmma_op + unit test * add arch limitation to wmma test * change arch limitation * Refactor + Add all type unit test(int4 compile failed) * Add f32_16x16x16_bf16 unit test * Remote int4 related * delete deprecated test Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `abf9cc6c5c`]	2022-12-02 11:41:13 -06:00
Po Yen Chen	02db748e74	Modularize ckProfiler operations (#514 ) * Re-structure ckProfiler source files * Rename profiler.cpp to main.cpp * Modularize ckProfiler operations * Add description for profiler operations * Use longer name to avoid name collision * Use macro to delay expansion * Use std::move() to avoid object copying * Prohibit users from calling dtor * Use macro to eliminate redundant code * Make friend function hidden * Add missing include directive <iostream> * Fix wrong include directives * Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com> [ROCm/composable_kernel commit: `8784a72e23`]	2022-12-01 15:15:02 -06:00
rocking5566	8e868bf880	gemm, conv perchannel quantization (#503 ) * Use gemm_multiple_D instead * Add gemm bias relu quantization example * Add pure gemm quantization example * Add quantization of perchannel conv + bias + relu example * Refine the code * Rename multiplier to requant_scale * Rename the folder * Remove redundant comment * Rename the file. Prepare to add perchannel * Add conv perchannel instance * Move to quantization folder * Add conv perchannel client example * Apply Rangify constructor of HostTensorDescriptor & Tensor<> * Fix merge error [ROCm/composable_kernel commit: `ad541ad6b9`]	2022-11-30 14:13:04 -06:00
Qianfeng	c3bb3db252	BatchNorm backward instance/external API/profiler/tests (#519 ) * Refine the device batchnorm-backward base API templates and data type assignments * Remove duplicated kernel file * Add batchnorm backward instances and external API * Add batchnorm-backward profiler and tests * Add client example which uses batchnorm backward external API * Merge test/batchnorm_fwd and test/batchnorm_bwd into one directory * Loose the threshold for batchnorm-backward check_err() [ROCm/composable_kernel commit: `63af525c06`]	2022-11-30 13:32:20 -06:00
Anthony Chang	eae37a7b6f	Fix split-k gemm test (#231 ) * properly return error flag; reveals bug in split-k gemm * fix bug in split k * update split-k test case Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `236bd148b9`]	2022-11-29 10:57:26 -06:00
fsx950223	26bbca370b	fix GetTypeString [ROCm/composable_kernel commit: `0e9c88cecf`]	2022-11-29 14:18:10 +08:00
Qianfeng	c036714248	BatchNorm backward implementation (#461 ) * Implemented batchnorm-backward Blockwise and Multiblock kernels * Add batchnorm-backward device op * Add batchnorm-backward host-reference op * Add batchnorm-backward example * Parameters renaming in batchnorm backward kernels and device op * Change in the example to loose the threshold for ScaleDiff checking * Add comments to explain the implementation of batchnorm-backward * Parameters renaming again in batchnorm backward kernels * Improve the expression calculation for performance * Add batchnorm backward to README * Add comments to explain inv-variance in batchnorm forward and backward * Renaming the batchnorm forward training and inferring examples * Add/update the comments for batchnorm-backward kernels * Renaming again * Add block_sync_lds between two consecutive blockwise reductions * Move common expression 1/N out of the static_for loops * Add dy_elementwise_op * Renaming in backward example again * Add checking for reduceDims in reference_batchnorm_backward * Update to comments and codes format * Rename in the comments * Remove common expression out of the loop in reference_batchnorm_backward_nhwc_c * Add block_sync_lds() between blockwise reduction again * Fix comments again * Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test [ROCm/composable_kernel commit: `44789d992a`]	2022-11-28 20:51:10 -06:00
Qianfeng	1a6febf03a	Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test (#516 ) [ROCm/composable_kernel commit: `5bf0475afd`]	2022-11-28 14:33:00 -06:00
Qianfeng	144efbf9b6	BatchNorm forward instance/external api/profiler/tests/client example (#511 ) * Update to device_batchnorm_forward base class to include all template parameters for problem description * Add batchnorm forward instances and external api * Add batchnorm forward profiler module which uses the external api * Add some comments in batchnorm_forward example to explain the dimensions in lengths[] * Replace the reference_batchnorm_forward_nhwc_c by generic reference_batchnorm_forward * Improvement to the batchnorm infer base API * Add batchnorm forward client example which shows using the batchnorm forward external API * Add test for batchnorm forward * Tuning the batchnorm profiler initialized values and error threshold * Add support for bhalf_t in instances/external api/tests * Add support for int8_t in instances/external api/tests * Add support for double in instances/external api/tests * Let ScaleDataType and BiasDataType be same as XDataType and YDataType when creating instances * Checking before running best instance in batchnorm_fwd_nhwc client example * Add checking for YElementwiseOp in batchnorm_forward external API * Add more types in batchnorm forward profiler * Add more test lengths Co-authored-by: rocking5566 <ChunYu.Lai@amd.com> [ROCm/composable_kernel commit: `4e6a5575be`]	2022-11-24 18:02:27 -06:00
Adam Osewski	0f3d9639a8	Client examples AddFastGelu and FastGelu + instances. (#509 ) * FastGelu support for more data types. * AddFastGelu & FastGelu instances. * Client example. * clang-format * Remove unused stride variable. * Add new line at EOF. Co-authored-by: Adam Osewski <aosewski@amd.com> [ROCm/composable_kernel commit: `43a889b72e`]	2022-11-19 22:08:26 -06:00
Anthony Chang	b607362c9a	Work around develop validation failure (#513 ) * workaround bf16 atten fwd issue on gfx908 * typo [ROCm/composable_kernel commit: `892a8d769d`]	2022-11-17 08:38:13 -08:00
guangzlu	3941d1e815	Add BF16 tests for batched_gemm_softmax_gemm_permute (#504 ) * fixed bug in softmax reference & add bf16 examples for batched_gemm_scale_softmax_gemm * added bf16 tests for batched_gemm_softmax_gemm_permute * changed format of device_batched_gemm_softmax_gemm_permute_xdl_cshuffle_bf16_bf16_bf16_bf16_gmk_gnk_gno_gmo_instance.cpp * changed format device_batched_gemm_softmax_gemm_permute_xdl_cshuffle_bf16_bf16_bf16_bf16_gmk_gnk_gno_gmo_instance.cpp * aligned annotations * modified CMakeLists for examples * add common example code of fp16/bf16 version for batched_gemm_scale_softmax_gemm_xdl * use macro to control the instances * added macro control into instances * clang-format some files * changed error tolerance for bf16 * changed index for 10_elementwise_normalization * fixed xdlops code bug in amd_xdlops.hpp Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `4c4c7328a6`]	2022-11-15 16:30:23 -06:00
ltqin	3073d82b47	Add Conv Backward Data on Navi21 for ResNet50 (#499 ) * start add example * add device dl * change launch kernel * change init data method * change example config * add config valid check * add instance for dl bwd * add instance to ckProfiler * reserver to profiler and cmakelist * add instance to ckProfiler2 * change instance f32 config * fix example return value Co-authored-by: letaoqin <letaoqin@amd.com> Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `db0eb1ea9c`]	2022-11-15 16:22:20 -06:00
Po Yen Chen	2837e81e40	Avoid reporting unused member function error (#507 ) [ROCm/composable_kernel commit: `7038723a46`]	2022-11-14 19:54:37 -06:00
Po Yen Chen	e418b29268	Introduce ck::accumulate_n() (#439 ) We can use this template to eliminate duplicated iterator computing logics. By providing return type to ck::accumulate_n(), we can avoid type conversion operations. [ROCm/composable_kernel commit: `730204eed0`]	2022-11-14 19:53:39 -06:00
Po Yen Chen	a8a4bdb756	Rangify STL algorithms (#438 ) * Rangify STL algorithms This commit adapts rangified std::copy(), std::fill() & std::transform() * Re-write more std::copy() calls * Re-write std::copy() calls in profiler [ROCm/composable_kernel commit: `dc663fae29`]	2022-11-14 15:17:28 -06:00
Po Yen Chen	6b0cb67348	Rangify check_err() (#444 ) * Rangify check_err() By rangifying check_err(), we can not only compare values between std::vector<>s, but also compare any ranges which have same value type. * Re-format example code [ROCm/composable_kernel commit: `b79bbbc22f`]	2022-11-11 11:39:39 -06:00
Po Yen Chen	9d8396c05c	Fix build errors on CI server (#506 ) * Add missing ignore expression * Add missing include directive [ROCm/composable_kernel commit: `4382b41469`]	2022-11-11 11:36:55 -06:00
Po Yen Chen	a4776782a5	Rangify constructor of HostTensorDescriptor & Tensor<> (#445 ) * Rangify STL algorithms This commit adapts rangified std::copy(), std::fill() & std::transform() * Rangify check_err() By rangifying check_err(), we can not only compare values between std::vector<>s, but also compare any ranges which have same value type. * Allow constructing Tensor<> like a HostTensorDescriptor * Simplify Tensor<> object construction logics * Remove more unnecessary 'HostTensorDescriptor' objects * Re-format example code * Re-write more HostTensorDescriptor ctor call [ROCm/composable_kernel commit: `4a2a56c22f`]	2022-11-11 11:36:01 -06:00
Lauren Wrubleski	87ab07799d	Add packages for examples and profiler (#502 ) * Add packages for example and profiler * correct TEST_NAME -> EXAMPLE_NAME [ROCm/composable_kernel commit: `37f2e91832`]	2022-11-10 13:19:33 -06:00
Po Yen Chen	acbe363156	Rangify FillUniformDistributionIntegerValue<> (#443 ) Allow passing forward range to its call operator [ROCm/composable_kernel commit: `6f0564f013`]	2022-11-10 13:03:01 -06:00
guangzlu	4ff0f25c68	add client example for elementwise_normalization (#501 ) * add client example for elementwise_normalization * clang format elementwise_layernorm2d.cpp * changed some naming to make it more understandable * changed naming of input into ab_input * fixed bug for threadwise_x_store * add elementwise operation to reference [ROCm/composable_kernel commit: `7045632885`]	2022-11-10 12:30:36 -06:00

1 2 3 4 5 ...

806 Commits