mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 02:02:46 +00:00
* Tiny fix in using data type template parameters in blockwise and direct_threadwise kernel
* Fix with regard to implementing GetZeroVal() in both kernel and host
* Avoid convert to compType from dstDataType before writting the output value
* Add half_t support to NumericLimits and make constexpr GetZeroVal() of binary operator
* Add CONSTANT decorator for descriptor read buffer
* Use get_thread_local_1d_id() for thread local Id
* Rename GetZeroVal() to GetReductionZeroVal() in the kernels
* Remove constexpr from initialized zeroVal and tiny fix in reduction_operator.hpp
* Occasional tiny simplification and update in the kernel files
* Update to re-order tensor dimensions on the host, split second_call kernel wrapper files and simplify reduce_all kernel wrappers
* Update to remove OpenCL tidy checking failures
* Update for better readability
* Remove unused codes and not-needed template parameters in the kernel wrappers
Co-authored-by: Chao Liu <chao.liu2@amd.com>
[ROCm/composable_kernel commit: b2dc55f82c]