mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-13 02:27:33 +00:00

Files

Qianfeng 52d082bade BatchNorm forward instance/external api/profiler/tests/client example (#511 )

* Update to device_batchnorm_forward base class to include all template parameters for problem description

* Add batchnorm forward instances and external api

* Add batchnorm forward profiler module which uses the external api

* Add some comments in batchnorm_forward example to explain the dimensions in lengths[]

* Replace the reference_batchnorm_forward_nhwc_c by generic reference_batchnorm_forward

* Improvement to the batchnorm infer base API

* Add batchnorm forward client example which shows using the batchnorm forward external API

* Add test for batchnorm forward

* Tuning the batchnorm profiler initialized values and error threshold

* Add support for bhalf_t in instances/external api/tests

* Add support for int8_t in instances/external api/tests

* Add support for double in instances/external api/tests

* Let ScaleDataType and BiasDataType be same as XDataType and YDataType when creating instances

* Checking before running best instance in batchnorm_fwd_nhwc client example

* Add checking for YElementwiseOp in batchnorm_forward external API

* Add more types in batchnorm forward profiler

* Add more test lengths

Co-authored-by: rocking5566 <ChunYu.Lai@amd.com>

[ROCm/composable_kernel commit: 4e6a5575be]

2022-11-24 18:02:27 -06:00

batchnorm_common.hpp

Batchnorm-forward implemented using welford method to calculate variance (#403 )

2022-10-27 18:52:54 -06:00

batchnorm_forward_nhwc.cpp

BatchNorm forward instance/external api/profiler/tests/client example (#511 )

2022-11-24 18:02:27 -06:00

batchnorm_infer_impl.hpp

Batchnorm-forward implemented using welford method to calculate variance (#403 )

2022-10-27 18:52:54 -06:00

batchnorm_infer_nhwc.cpp

BatchNorm forward instance/external api/profiler/tests/client example (#511 )

2022-11-24 18:02:27 -06:00

CMakeLists.txt

Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320 )

2022-08-15 10:11:02 -05:00

README.md

Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320 )

2022-08-15 10:11:02 -05:00

README.md

Instructions for `batchnorm nhwc` Example

Run `batchnorm forward nhwc`

# -D <xxx> : input 4-d tensor lengths
# -v <x> :   verification (0=no, 1=yes)
#arg1:  data type (0: fp16, 1: fp32, 3: int8, 5: bp16, 6: fp64)
#arg2: 1/0 to indicate whether to update the moving average and variance (0=no, 1=yes)
#arg3: 1/0 to indicate whether to save result mean/invVariance (0=no, 1=yes)
#arg4: initialization (0=no init, 1=single integer value, 2=scope integer value, 3=decimal value)
#arg5: time kernel (0=no, 1=yes) 
./bin/example_batchnorm_forward -D 128,16,16,1024 -v 1 0 0 1 2 1

Result

./bin/example_batchnorm_forward -D 128,16,16,1024 -v 1 0 0 1 2 1
launch_and_time_kernel: grid_dim {64, 1, 1}, block_dim {256, 1, 1} 
Warm up 1 time
Start running 10 times...
launch_and_time_kernel: grid_dim {120, 1, 1}, block_dim {256, 1, 1} 
Warm up 1 time
Start running 10 times...
launch_and_time_kernel: grid_dim {120, 1, 1}, block_dim {256, 1, 1} 
Warm up 1 time
Start running 10 times...
Perf: 2.08231 ms, 354.519 GB/s

Result

./bin/example_batchnorm_forward -D 128,16,16,1024 -v 1 0 1 0 2 0
echo $?
0

Run `batchnorm infer nhwc`

# -D <xxx> : input 4-d tensor lengths
# -v <x> :   verification (0=no, 1=yes)
#arg1:  data type (0: fp16, 1: fp32, 3: int8, 5: bp16, 6: fp64)
#arg2: initialization (0=no init, 1=single integer value, 2=scope integer value, 3=decimal value)
#arg3: time kernel (0=no, 1=yes)
./bin/example_batchnorm_infer -D 128,16,16,1024 -v 1 0 2 1

Result

./bin/example_batchnorm_infer -D 128,16,16,1024 -v 1 0 2 1
launch_and_time_kernel: grid_dim {120, 1, 1}, block_dim {256, 1, 1} 
Warm up 1 time
Start running 10 times...
Perf: 1.28235 ms, 523.329 GB/s

README.md

Instructions for batchnorm nhwc Example

Run batchnorm forward nhwc

Run batchnorm infer nhwc

Instructions for `batchnorm nhwc` Example

Run `batchnorm forward nhwc`

Run `batchnorm infer nhwc`