composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-17 09:08:35 +00:00

Files

Qianfeng 7fa892e63e Batchnorm-forward implemented using welford method to calculate variance (#403 )

* Update to the batchnorm-forward API and base class

* Fix leeked header including in gridwise_set_buffer_value.hpp

* Add kernels and device file for batchnorm-forward welford supporting both blockwise and multi-block reduction

* Update to the batchnorm-forward example to use the new batchnorm-forward device interface

* Change the batchnorm-forward reference to use sequential welford method

* Change to assign the workspace into four buffers in the host layer

* Use GetReduceCountPerThread functor to replace the initial count for Blockwise and Multiblock welford

* Tiny correction and remove un-used file under example/34_batchnorm

* Renaming in the kernel arguments

* Explicitly use ck::math::sqrt in batchnorm-forward kernels

* Add some comments to some kernels

* Tiny fix

* Generalize the data types in reference_batchnorm_forward_nhwc_c

* Use ck::ignore to mark un-used parameters

* Move GetReduceCountPerThread functor codes from kernel to device

* Remove some un-used codes in device_batchnorm_forward_impl.hpp

* Tiny fix in batchnorm_forward example

* Move GetReduceCountPerThread() to welford_helper.hpp

* Use seperate data type for Scale and Bias

* Renaming in device Op

* Tiny fix in forward example

* Updata to batchnorm-infer (type spliting, renaming)

* Add time and bandwidth measurement to the batchnorm-forward example

* Add support of elementwise operation for batchnorm forward output

* Reduce object copying by passing object as reference type

* Tiny change for performance

* Updates for performance again

* Some Renamings

* Add GetActualVariance template parameter for ThreadwiseWelfordMerge

* Tiny update in reference batchnorm forward nhwc/c

* Move batchnorm multiblock kernel files to grid/batchnorm_multiblock sub-directory

* Fuse mean and bias in the normalization calculation

Co-authored-by: root <root@dc-smc-18.amd.com>
Co-authored-by: rocking5566 <ChunYu.Lai@amd.com>

2022-10-27 18:52:54 -06:00

reduction_functions_threadwise.hpp

Single-kernel GEMM + layernorm (#263 )

2022-07-01 01:38:00 -05:00

threadwise_contraction_dl.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_gemm_dlops_v3.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_set.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v3r1.hpp

Add 'Permute' device op & example (#408 )

2022-09-19 21:30:25 -05:00

threadwise_tensor_slice_transfer_v3r3.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v4r1.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v5r1.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v6r1.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v6r2.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v6r3.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v7.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer.hpp

Skip lds of b matrix (#326 )

2022-08-13 01:35:49 -05:00

threadwise_welford.hpp

Batchnorm-forward implemented using welford method to calculate variance (#403 )

2022-10-27 18:52:54 -06:00