composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-12 01:10:17 +00:00

Author	SHA1	Message	Date
Po Yen Chen	a670a5a092	Move literal ""_uz & ""_zu into namespace 'ck::literals' (#354 ) * Move literal ""_uz & ""_zu into namespace 'literals' * Move namespace 'literals' as 'ck::literals'	2022-08-12 17:48:35 -05:00
Rostyslav Geyyer	0c6ef7c14e	Add example of conv_fwd_bias_relu_add for int4, int8, bfp16, fp16, and fp32 (#343 ) * [LWPCK-359] Initial commit * Working version for fp16, add results to readme * Update according to PR #341 * Update results in readme * Add fp32 example * Add bf16 example * Update fp16 and fp32 examples * Add int8 example * Add separate lengths and strides tensors for D tensors Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>	2022-08-12 15:30:27 -05:00
zjing14	35e49f2de6	add g; fixed strides (#355 )	2022-08-12 15:22:39 -05:00
Illia Silin	de60d290b6	Build docker only once in CI, fix conv_bwd logfile names. (#353 ) * build docker in separate stage * build docker with only one prefix * add parallel statement * add docker repo url * fix the name of perf_conv_bwd_data log file	2022-08-12 12:30:37 -05:00
Po Yen Chen	68b61504a3	Add examples for GEMM + AddAddFastGelu (data type: int8, bf16, fp32) (#340 ) * Add always_false<> util to delay symbol resolution * Use always_false<> to prevent trying instantiate unwanted method * Add new specializations of AddAddFastGelu::operator() method * Add GEMM + AddAddFastGelu examples for data types: int8, bf16, fp32 * Use floating point literal to simplify code * Remove unnecessary capture in lambda expressions * Extract fast GeLU calculation as standalone method * Mark methods as 'constexpr' * Add constraint for HostTensorDescriptor templated ctors * Simplify HostTensorDescriptor ctor calls * Add C++23 std::size_t literal suffix * Use _uz suffix to shorten example code * Remove unnecessary conversion to std::array<> * Re-order include directives * Remove C-style casting by literal suffix * Remove unnecessary statements in main() * Remove unused type parameter of always_false<> * Remove unused include directive * Exit main() by returning meaningful value * Use 'if constexpr' to switch example flow * Use std::is_same_v<> to shorten example code * Add 'inline' specifier to literal functions * Unify output methods in example * Move common codes into .inc file * Add type check in type_convert<>() * Add type_convert<float>() before computation * Merge AddAddFastGelu method specializations * Remove always_false<> * Add constraint to AddAddFastGelu::operator() parameter types	2022-08-11 17:31:28 -05:00
rocking5566	fdfd7eb597	ckProfiler for layernorm (#330 ) * Refine parameter * Add base class for layernorm * Add layernorm instance * Add layernorm to ckProfiler * Remove redundant * Add verification * Fix compile error due to merge	2022-08-11 17:03:54 -05:00
zjing14	e08d68d25d	Add batched/grouped_gemm contraction deviceOps (#349 ) * convnd_fwd fp16 example * update example * update example * update instance * updating refernce conv * update reference conv * update conv fwd profiler * update conv 1d and 3d instance * update include path * clean * update profiler for conv bwd data and weight * update conv bwd weight * clean * update conv example * update profiler for conv bwd weight * update ckprofiler for conv bwd data * fix reference conv bwd data bug; update conv bwd data test * update examples * fix initialization issue * update test for conv fwd * clean * clean * remove test case too sensitive to error threshhold * fix test * clean * fix build * adding conv multiple d * adding conv multiple D * add matrix padder * add gemm padding to convnd * adding group conv * update gemm multi-d * refactor * refactor * refactor * clean * clean * refactor * refactor * reorg * add ds * add bias * clean * add G * adding group * adding group * adding group * update Tensor * clean * update example * update DeviceGemmMultipleD_Xdl_CShuffle * update conv bwd-data and bwd-weight * upate contraction example * update gemm and batch gemm with e permute * fix example build * instance for grouped conv1d * update example * adding group conv instance * update gemm bilinear instance * update gemm+add+add+fastgelu instance * update profiler * update profiler * update test * update test and client example * clean * add grouped conv into profiler * update profiler * clean * add test grouped conv, update all conv test to gtest * update test * change gemm_c_permute with contraction * add grouped_contraction * add contraction in group_gemm * add example of grouped_gemm with contraction * add example of grouped_contraction_bias_e_permute * clean * fixed ds * add m3n2 m2n3 examples into gemm_bias_e_permute Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-08-10 12:20:29 -05:00
Illia Silin	aba7fefce7	Fix QA, allow switching compiler versions, fix google test compilation error. (#348 ) * allow selecting compiler version * fix typo * add Wno-deprecated flag for google tests * change git repo, fix qa log files names * change the git clone syntax * use Omkar's git credentials * try to use jenkins as git user * try using illsilin username for gerrit repo with ssh key * try new gerrit authorization * change ssh key syntax * try another way of passing ssh key to docker * add mount ssh in dockerfile * create .ssh folder * move ssh-keyscan to later * get rid of npm call * build first docker image on master * check the contents of the .ssh folder * try replacing omkars creds with gerrit creds * use open repo, clean up changes * get rid of ssh default argument	2022-08-08 13:49:14 -05:00
Chao Liu	146972f447	fix bug in gemm profiler (#344 )	2022-08-07 12:23:32 -05:00
Chao Liu	75ab874e02	Update Group convolution (#341 ) * add conv oddC * update example * update example * fix bug in example * fix bug in group conv example	2022-08-03 12:28:33 -05:00
Adam Osewski	fb0dc35861	CGEMM examples bf16, fp32, int8 (#332 ) * Add int8 specialization for elementwise Add and Subtract. * CGEMM examples bf16, fp32, int8 * Add convert reference output to CDataType. * Skip BF16 data type during testing. * Lower K value to get rid of accumulation error. * Fix merge artifact. * Fix changed function name: GetElementSpaceSize() * Fix merge artifact. Co-authored-by: Adam Osewski <aosewski@amd.com>	2022-08-02 14:52:27 -05:00
Illia Silin	984b3722bf	Run CI on MI100 nodes only, run daily QA on MI200 nodes. (#339 ) * turn on full qa only on gfx90a, use int initialization * change script syntax * update script parsing clinfo, throw exception if 0 devices * fix syntax * try using toBoolean for the QA conditions * run regular CI on MI100 only, use MI200 only for daily QA * evaluate when conditions before agent * launch QA on develop branch and update profile_reduce script * update test script * update script * remove false dependency from dockerfile * try removing rbuild completely Co-authored-by: Chao Liu <chao.liu2@amd.com> Co-authored-by: Chao Liu <lc.roy86@gmail.com>	2022-08-02 09:17:11 -05:00
Chao Liu	500fa99512	Clean up conv example, Instances, profiler and test (#324 ) * convnd_fwd fp16 example * update example * update example * update instance * updating refernce conv * update reference conv * update conv fwd profiler * update conv 1d and 3d instance * update include path * clean * update profiler for conv bwd data and weight * update conv bwd weight * clean * update conv example * update profiler for conv bwd weight * update ckprofiler for conv bwd data * fix reference conv bwd data bug; update conv bwd data test * update examples * fix initialization issue * update test for conv fwd * clean * clean * remove test case too sensitive to error threshhold * fix test * clean * fix build * adding conv multiple d * adding conv multiple D * add matrix padder * add gemm padding to convnd * adding group conv * update gemm multi-d * refactor * refactor * refactor * clean * clean * refactor * refactor * reorg * add ds * add bias * clean * add G * adding group * adding group * adding group * update Tensor * clean * update example * update DeviceGemmMultipleD_Xdl_CShuffle * update conv bwd-data and bwd-weight * upate contraction example * update gemm and batch gemm with e permute * fix example build * instance for grouped conv1d * update example * adding group conv instance * update gemm bilinear instance * update gemm+add+add+fastgelu instance * update profiler * update profiler * update test * update test and client example * clean * add grouped conv into profiler * update profiler * clean * add test grouped conv, update all conv test to gtest * update test	2022-07-29 18:19:25 -05:00
Illia Silin	85978e0201	comment out cron trigger (#334 )	2022-07-22 13:52:10 -05:00
zjing14	d7d7829096	Batched Gemm with multiD (#329 ) * add batched_gemm_multiD * add ds * rename file * add batched_gemm_bias example * add batch_strides into bmm_c_permute * clean * rename example_28 to example_29 Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-07-22 09:33:50 -05:00
Illia Silin	d8415a96b3	Add full QA with verification option, few other changes. (#331 ) * add verify flag and update scripts * replace old check_error function with the new check_err * fix syntax * remove blank spaces * remove empty line * add check_err for tensors * fix syntax * replace tensors with vectors in check_err calls * fix syntax * remove blank spaces * fix syntax * add new line at end of file * disable conv2d_bwd_weight test, add gpu check * set check_gpu using export * check GPU using runShell * add definition of runShell * fix script syntax * reduce the number of threads, add full qa option * run processing scripts in bash * fix the branch and host names in performance scripts, add chronos * replace parameterizedCron with cron * archive the perf log files * try to fix git call * pass branch and host names as arguments into scripts * fix script arguments * fix script arguments * process results on master * fix pipeline * add definition of gpu_arch * run processing scripts in docker * fix the brackets * add agent master for the processing stage * get rid of show_node_info call on master * try using mici label instead of master, disable MI100 tests for now * fix syntax * simplify container for results processing * remove node(master) from the process_results stage * put all stages in original order * change the agent label from master to mici for gfx908	2022-07-21 15:25:46 -05:00
zjing14	7959dad566	Grouped Gemm device with multiD grid (#319 ) * replace gridwise_v2r3 with multiD * adjust parameters * add instances * fixed test_grouped_gemm * fix standalone softmax race condition around blockwise reduction * fixed ci * fixed comment: remove redundant workspace * use instanceFactory * add test layout * add empty Ds * add bias example * use array * sperate examples Co-authored-by: Anthony Chang <ac.chang@outlook.com>	2022-07-21 10:07:01 -05:00
Anthony Chang	a11680cce6	fix standalone softmax race condition around blockwise reduction (#323 )	2022-07-14 22:52:45 -05:00
rocking5566	7f21662089	Standalone layernorm (#315 ) * Implement layernorm kernel and deviceOp * verify gpu kernel with host code * 1. Separate gamma aand beta from affine 2. Check if argument is valid * clean * Sync the naming * Support sweep once mode if we can put k dimension data inside one block * [What] Get length from upper length. [Why] if we get length directly, we may get length after padding. * We only use one block in K dimension. Hence, we can simplify the indexing of global R/W. * Use 1d descriptor for gamma and beta * Add accElementwiseOp * Extract layernorm host code * Support different YVectorDim in GridwiseLayernorm * Rename XSrcVectorDim to XYSrcVectorDim. Because we use same parameter in deviceOp * Gamma and beta can share the VGPR. * Add test for fp32 and fp16 * Fix bug of concurrency and add test case which may fail orignally * Propagate NaN for layernorm Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-07-13 11:16:14 -05:00
Daming Feng	c5620ed0ca	minor fix in gemm client example (#328 )	2022-07-13 10:54:38 -05:00
Illia Silin	39acaea36d	Add switch between compilers, make 9110 compiler default, add full QA scripts. (#322 ) * adding scripts for full perf test suite * uncomment the sql queries * fix typo and chmod a+x for scripts * dos2unix for all new scripts * disable verification in full performance test * fix reduction scripts, add gfrouped_gemm hotfix * fix the grouped_gemm hotfix and only run reduction for fp16 * change compiler flag syntax * fix syntax * add predefinition of dockerArgs * avoid redefinitions of dockerArgs * add blank space at the end of dockerArgs * try to build with release compiler * adding spaces inside if condition * limit the number of threads for building 9110 compiler * change the way HIP_CLANG_PATH is set * remove the export command * change the conditional ENV syntax * set HIP_CLANG_PATH at docker run time * update scripts for full qa * enable the sql write query * fix typo * remove a comment from a script	2022-07-13 09:27:43 -05:00
Po Yen Chen	639147432b	GEMM pipeline v2 (#317 ) * format * improving pipeline * fix typo * format * adding thread group * adding thread group * adding thread group * adding gemm pipeline * tweak * refactor * refactor * add missing type convert * refactor * refactor * refactor * clean * fix build * refactor * format * clean up * use remove_cvref_t * clean * use pipeline_v2 for gemm kernel * Remove inconsistent indent * Fix compilation errors due to incomplete merge process * Add missing include directives * Fix compilation errors in currently unused files * Add license in newly added files * Re-format touched files by clang-format-10 * Fix wrong template argument count of DeviceGemm<> * Use language construct to choose between types * Use language construct to choose GEMM example instance * Fix compilation error due to interface change * Re-use type alias to avoid duplication * Unify type alias usage in source file * Only use v2 pipeline in one gridwise GEMM type * Remove no-longer used include directives * Add static_assert() to check pipeline type requirements * Revert "Add static_assert() to check pipeline type requirements" This reverts commit `f0985f0a13`. * clean * clean * clean * clean Co-authored-by: Chao Liu <chao.liu2@amd.com> Co-authored-by: shaojiewang <wsjmessi@163.com>	2022-07-08 15:55:14 -05:00
Shaojie WANG	763ca61581	add conv1d/3d bwd weight instances (#318 ) * add conv1d/3d bwd weight instances * add profiler code	2022-07-08 15:42:20 -05:00
Chao Liu	4fe9c393b8	N-D Tensor Contraction example, instance, and client example (#270 ) * adding contraction * add contraction example * update examle * update example * format * update readme * clean header * clean header * contraction with multiple D * rename * fix naming issue; add instances for contraction+bilinear * change assumed virtual layout of contraction; add client example * update example * update * contraction+scale * use type_convert * rename	2022-07-07 14:31:11 -05:00
zjing14	334361cbde	Batched Gemm with C Permute (#305 ) * init commit * add c_permute * add mnk padding * fixed comments * Fixed comments Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-07-06 10:38:29 -05:00
Chao Liu	9e4429f9c3	Gemm+Bilinear (#316 ) * refactor * update example * update example * gemm bilinear * clean * update	2022-07-02 09:15:38 -05:00
guangzlu	8e374781d5	modified grouped gemm addressing method (#307 ) * modified grouped gemm addressing method * modified addressing method in device_grouped_gemm_xdl.hpp Co-authored-by: root <root@dc-smc-13.amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-07-01 01:38:21 -05:00
Anthony Chang	63fd5da637	Single-kernel GEMM + layernorm (#263 ) * dump lds content in appropriate precision type * add squared add reduction op; allows sq sum * initial stub from regular gemm impl * layernorm example code & host verification * initial layernorm implementation * tidy up * make C0 precision type consistent with C * clang-tidy and additional comments * tighten up example code * account for extra flops/bytes from normalization * clang-format * c0 bias/beta/gamma now have its own precision type * AccElemOp for gemm outputs prior to feeding to layernorm * update workgroup mapping * rename kernel template param to reflect its dual use * use LDS mem pool for reduction workspace * change cshuffle precision type to f16; clean up * clang-format * correct naming * explicit cast * fully implemented gemm + bias + activation + add + norm * activation in correct order * reflect reduction API's recent change * amend * clean up; add comment * keep up with recent changes in reduction API * format * resolve merge conflicts Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-07-01 01:38:00 -05:00
zjing14	1c8126a4c2	add batch_stride into batched gemm (#314 ) * add batch_stride * fixed test Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-07-01 01:35:37 -05:00
Chao Liu	0dcb3496cf	Improve external interface for GEMM and GEMM+add+add+fastgelu (#311 ) * interface for GEMM and GEMM+add+add+fastgelu * rename namespace * instance factory * fix build * fix build; add GEMM client example * clean	2022-06-30 22:11:00 -05:00
zjing14	fa9a0a5cfb	Gemm + bias + c_permute (#312 ) * init commit * add desc * finished c permute * fixed vector lens	2022-06-30 19:55:09 -05:00
zjing14	ab6c82c984	Grouped Gemm ckProfiler hotfix (#313 ) * add setWorkspace in profiler * fix	2022-06-30 16:37:37 -05:00
Anthony Chang	93c99f3d87	Standalone sweep once softmax kernel w/ ckProfiler (#295 ) * use 'sweep once' softmax kernel where applicable * threadwise copy's dst buffer can specify invalid element value * add int8 in/out float compute softmax support give a bit of leeway for int absolute tolerance as there's a single data point of all test cases showing off-by-1 error * format * softmax inherits DeviceNormalization * softmax profiler stub * tighten up reference softmax interface * example prints tensor dimension * add fp32 to softmax profiler * rename header * hook with ckProfiler * format * resolve merge conflict * resolve merge conflicts * update normalization profiler help string * resolve conflict * typo * remove residual * softmax profiler: address feedback * test for mixed precision input/output * fully qualify ck::math::isnan * add comment for device normalization interface * revise wording * constness for alpha/beta scaler pointer	2022-06-30 12:08:50 -05:00
Liam Wrubleski	eccf8773a6	Remove incorrect old packaging statement (#308 )	2022-06-30 09:40:03 -05:00
rocking5566	12235112a1	external api for gemm + layernorm (#285 ) * Extract base class for elementwise * Refactor interface of DeviceGemmReduce. Do not use tuple in interface * [What] Rename d into reduce in gemm + reduction related code [Why] Prepare to add d term for add * Unify base class of gemm + reduce and gemm + bias + add + reduce * 1. Rename gemm_bias_add_reduce for external api 2. Refine cmake * Add normalize device operation * [What] Reorder the argument [Why] Because d0 is also the input of c. * Add type string * Add example of gemm_bias_add_layernorm via external api * Refactor example code * clang-format * Fix compile error * clang-format * Add external api for gemm_add_add_layernorm and normalize * Add client example * clang-format	2022-06-27 14:25:10 -05:00
Chao Liu	aebd211c36	External Interface (#304 ) * add client example * clean * clean * reorg * clean up profiler * reorg * clea * fix profiler * function for getinstances * update client example * update client example * update client example * update * update example * update Jenkins file * update cmake * update Jenkins	2022-06-26 19:39:02 -05:00
Liam Wrubleski	b653c5eb2e	Switch to standard ROCm packaging (#301 ) * Switch to standard ROCm packaging * Revert .gitignore changes * install new rocm-cmake version * update readme Co-authored-by: illsilin <Illia.Silin@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-06-25 09:35:16 -05:00
Chao Liu	d3051d7517	add license in file (#303 )	2022-06-24 23:32:43 -05:00
Chao Liu	d1db6a0c3e	Absolute include path (#281 ) * ad gelu and fast_gelu * added GeLU and fast GeLU * clean up * add gemm+fastgelu example * add gemm+gelu instances * update profiler * clean up * clean up * adding gemm+bias+activation * clean * adding bias * clean * adding gemm multiple d * debugging * add gemm bias add fastgelu * rename, clean * refactoring; add readme * refactor * refactor * refactor * refactor * refactor * refactor * fix * fix * update example * update example * rename * update example * add ckProfiler * clean * clean * clean * clean * add client app example * update readme * delete obselete files * remove old client app * delete old file * cleaning * clean * remove half * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path for all examples * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * revert client app example * clean build * fix build * temporary disable client test on Jenkins * clean * clean * clean	2022-06-24 20:51:04 -05:00
Chao Liu	a49115b95e	update license (#297 ) * update license * update license * update license * update license	2022-06-23 01:27:30 -05:00
Adam Osewski	a2edd7d802	Testing all fwd convolution specializations. (#259 ) * UniforFill with integer values. * Log tested instance type string. * Add UT for all convolution specializations. * debugging conv * Fix dangling reference bug. * Small refinements. * Fix call to error checking function. * Small refinements to tests. * Configure error tolerance * Change problem size. * Remove OddC case from types that do not support it. * Add helper traits for AccumulatorDataType. * Print first 5 errs in check_err for integral types. * Rename FillUniform to FillUniformDistribution * Refactor * Do not use typed tests. * Instead use plain fixture class with templatized member functions. * Initialize tensors with integer values. * Refine test instances. * Properly set accumulator data type. * Add another "big" instance. * Refactor convolution tests. * Revert "debugging conv" This reverts commit `b109516455`. * Add pragma once + format + small refinement. * Fix some unwanted changes. * Clang-format * Fix profile_convnd to use renamed tensor initializer. * Add instances for ConvFWDND kernel case 2D * Helpers to get ConvNDFwd 2D instances. * Refactoring. * Remove "small block" instance as it was generating compiler errors. * Remove default template parameters values. * Refine and fix test. * Fix problem with default template parameter types. * Adjust error thresholds for floating point values test. * Use integer values initialization for instances test. * Add tests for ConvNDFwd 2D case. * Remove AccumulatorDataType type trait. * Update unit-tests. * Remove operator<< overload. * Unlock conv1d/3d nd fwd instances. * Enable skipping calculating reference using flag. * Fix number of channels for first ResNet50 layer. * Clang-format. Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-06-22 22:05:04 -05:00
Shaojie WANG	4634b12043	fix Issue 291 (#294 ) * rename for typeconvert functor * refine code	2022-06-21 17:10:56 -05:00
Anthony Chang	15c89e81f0	Standalone softmax kernel (#284 ) * initial stub for standalone softmax * start device_softmax_mk_to_mk as a wrapper to device_reduce_mk_to_m * host softmax validates * compiles; to implement beta scaling * use NaN trick to efficiently ignore OOB values during sum of exponentials * freeload device_reduce's utility functions * clean up interface * adding prior value (beta scaling) * remove restriction related to perf considerations * apply clang-format * clean; disable diagnostics * resolve conflicts * add exp wrapper * honor HostTensorDesc interface; allow implicit cast from different vector<T> type * test softmax for fp16/fp32 * update readme * amend commit NaN trick * remove redundant param added during development * format * replace ScalarDataType with AccDataType * separate out test programs by precision type * move softmax sample code to its own folder * format * keep up with recent changes in reduction API * remove extra header	2022-06-21 14:59:19 -05:00
Chao Liu	be60d60d7a	Create MIT LICENSE (#229 ) * Create LICENSE * add contributors, add license into config.hpp * update	2022-06-21 14:55:56 -05:00
Anthony Chang	1ae241092f	bring up to date with the usage of __builtin_amdgcn_sched_barrier (#293 )	2022-06-21 10:15:31 -05:00
Chao Liu	ccbd8d907b	update readme and script (#290 )	2022-06-20 23:34:32 -05:00
Chao Liu	56adf7e9cc	GEMM with Multiple Source, GEMM+Bias+Add+FastGeLU example and ckProfiler (#241 ) * ad gelu and fast_gelu * added GeLU and fast GeLU * clean up * add gemm+fastgelu example * add gemm+gelu instances * update profiler * clean up * clean up * adding gemm+bias+activation * clean * adding bias * clean * adding gemm multiple d * debugging * add gemm bias add fastgelu * rename, clean * refactoring; add readme * refactor * refactor * refactor * refactor * refactor * refactor * fix * fix * update example * update example * rename * update example * add ckProfiler * clean * clean * clean * clean * add comment * use type_convert * clean * clean element wise op	2022-06-19 03:07:28 -05:00
Illia Silin	e4584d91ac	Don't look up the /sys/module/amdgpu/version file. (#287 ) * use pre-built docker instead of building a new one * try docker.image.pull * change syntax in docker.image() * add 30 min timeout * increase timeout to 3 hours * move performance tests to first stage for testing * set image variable to the new container name * update image name * check available images * check available images in both places * try different image name * use image ID to refer to image * run performance on gfx90a * fix the gpu_arch labeling, add parameter * move env vars out of stages * add stand-alone performance script, MI200 tests, CU numbers * dos2unix for run_perf_tests.sh * try the new git credentials * use env var for git credentials * don't look up /sys/module/amdgpu/version Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-06-17 15:11:21 -05:00
Qianfeng	1f543bfa79	Regulate reduction accumulator operations and Element-wise operations (#274 ) * Remove template from Reducton operation classes and add template to their operator() and GetIdentityValue() interfaces * Change to unary elementwise operators and the reduce_unary_operator (class for mapping) and dependent variations in all host layers * Remove the data type template parameter from reduce_binary_operator (class for mapping) and dependent variations in host layers * Add InMemoryDataOperatonSupportedOnDataType to check the matching between data type and InMemoryDataOperation * Use struct-scope operator template instantiation for binary and unary element-wise operations * Change a few more elementwise operations to use template for operator() * Tiny correction in Normalize operator * Add static_assert to check the data type appliability for some reduction accumulator and element-wise operatons * Correction in some examples with regard to using ReduceAccDataType * Use static_assert for UnaryDivide * Update to merged codes to use Element-wise operations and Reduction Accumulator operations correctly * Tiny fix with regard to SetWorkSpacePointer()	2022-06-17 15:10:25 -05:00
Shaojie WANG	63cdd92398	use universal workspace pointer in bwd-weight (#286 )	2022-06-17 14:11:20 -05:00

1 2 3 4 5 ...

657 Commits