mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 18:17:44 +00:00
* Simplify the macros for declaring and defining the add_device_reduce_instance_xxxx() instances
* Change the types of lengths and strides from std::vector to std::array for the reduction device interfaces
* Remove DeviceSoftmaxImpl's depending on DeviceReduceMultiblock
* Split the cpp and hpp files for reduction instances to enable more parallel compiling
* Remove the using of macros for declaring reduction instances and instance references
* Update to add_device_reduce_instance_xxxx templated functions
* Use ReduceOperation+InElementwiseOp+AccElementwiseOp to repace the ReduceOpId in defining add_reduce_instance_xxxx() templates
* Change return format
[ROCm/composable_kernel commit: dda3a0a10b]