rocking5566 42db20408c Gemm + bias + relu + add + layernorm (#272)
* Copy "gemm reduce" to "gemm bias add reduce"

* Implement gemm bias add reduction

* Fix compiler error due to merge from develop

* Add tensor operation for gemm + bias + add + reduce

* Add gemm_bais_add_reduce to ckProfiler

* Add c1 functor

* Refine type

* Use reduceAccDataType instead of explicitly float

* Change to use check_err()

* Do relu in float32 instead of bhalf_t. Because bhalf_t is unsigned

* Refactor relu. using type_trait instead of overloading

* Rename DxsReduceAccElementwiseOperation to DxsReduceAccElementwiseOperation

* Fix denominator

* Refine nameing

* Fix denominator  in host

* Remove useless include header

* Use AccDataType

* Fix static_cast order

* Refine type

* [What] Remove tuple type in the base class
[Why] External api depend on base class. if base class has relationship with type, we will need many class for different type

[ROCm/composable_kernel commit: 6eb5549923]
2022-06-16 23:49:20 -05:00
2022-05-12 09:21:01 -05:00
2022-06-15 21:26:48 -05:00
2018-10-08 22:49:58 -05:00
2021-08-08 17:41:54 +00:00
2022-05-20 14:40:12 -05:00
2022-05-12 09:21:01 -05:00
2022-06-15 21:26:48 -05:00
2022-05-12 09:21:01 -05:00

Docker script

docker run                                     \
-it                                            \
--privileged                                   \
--group-add sudo                               \
-w /root/workspace                             \
-v ${PATH_TO_LOCAL_WORKSPACE}:/root/workspace  \
rocm/tensorflow:rocm4.3.1-tf2.6-dev            \
/bin/bash

Build

mkdir build && cd build
# Need to specify target ID, example below is gfx908 and gfx90a
cmake                                                                 \
-D BUILD_DEV=OFF                                                      \
-D CMAKE_BUILD_TYPE=Release                                           \
-D CMAKE_CXX_FLAGS=" --offload-arch=gfx908 --offload-arch=gfx90a -O3" \
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc                             \
-D CMAKE_PREFIX_PATH=/opt/rocm                                        \
..

Build and Run Examples

 make -j examples

Instructions for running each individual examples are under example/

Tests

 make -j tests
 make test

Build ckProfiler

 make -j ckProfiler

Instructions for running ckProfiler are under profiler/

Caveat

Kernel Timing and Verification

CK's own kernel timer will warn up kernel once, and then run it multiple times to get average kernel time. For some kernels that use atomic add, this will cause output buffer to be accumulated multiple times, causing verfication failure. To work around it, do not use CK's own timer and do verification at the same time. CK's own timer and verification in each example and ckProfiler can be enabled or disabled from command line.

Description
[DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror
Readme MIT 234 MiB
Languages
C++ 93.1%
Python 4.5%
CMake 1.5%
Shell 0.5%
Pawn 0.2%