mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 10:09:41 +00:00
* convnd_fwd fp16 example
* update example
* update example
* update instance
* updating refernce conv
* update reference conv
* update conv fwd profiler
* update conv 1d and 3d instance
* update include path
* clean
* update profiler for conv bwd data and weight
* update conv bwd weight
* clean
* update conv example
* update profiler for conv bwd weight
* update ckprofiler for conv bwd data
* fix reference conv bwd data bug; update conv bwd data test
* update examples
* fix initialization issue
* update test for conv fwd
* clean
* clean
* remove test case too sensitive to error threshhold
* fix test
* clean
* fix build
* adding conv multiple d
* adding conv multiple D
* add matrix padder
* add gemm padding to convnd
* adding group conv
* update gemm multi-d
* refactor
* refactor
* refactor
* clean
* clean
* refactor
* refactor
* reorg
* add ds
* add bias
* clean
* add G
* adding group
* adding group
* adding group
* update Tensor
* clean
* update example
* update DeviceGemmMultipleD_Xdl_CShuffle
* update conv bwd-data and bwd-weight
* upate contraction example
* update gemm and batch gemm with e permute
* fix example build
* instance for grouped conv1d
* update example
* adding group conv instance
* update gemm bilinear instance
* update gemm+add+add+fastgelu instance
* update profiler
* update profiler
* update test
* update test and client example
* clean
* add grouped conv into profiler
* update profiler
* clean
* add test grouped conv, update all conv test to gtest
* update test
[ROCm/composable_kernel commit: 500fa99512]
67 lines
2.0 KiB
Markdown
67 lines
2.0 KiB
Markdown
## Docker script
|
|
```bash
|
|
docker run \
|
|
-it \
|
|
--privileged \
|
|
--group-add sudo \
|
|
-w /root/workspace \
|
|
-v ${PATH_TO_LOCAL_WORKSPACE}:/root/workspace \
|
|
rocm/tensorflow:rocm5.1-tf2.6-dev \
|
|
/bin/bash
|
|
```
|
|
|
|
# Install newer version of rocm-cmake
|
|
https://github.com/RadeonOpenCompute/rocm-cmake
|
|
|
|
## Build
|
|
```bash
|
|
mkdir build && cd build
|
|
```
|
|
|
|
```bash
|
|
# Need to specify target ID, example below is gfx908 and gfx90a
|
|
cmake \
|
|
-D BUILD_DEV=OFF \
|
|
-D CMAKE_BUILD_TYPE=Release \
|
|
-D CMAKE_CXX_FLAGS=" --offload-arch=gfx908 --offload-arch=gfx90a -O3" \
|
|
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
|
|
-D CMAKE_PREFIX_PATH=/opt/rocm \
|
|
-D CMAKE_INSTALL_PREFIX=${PATH_TO_CK_INSTALL_DIRECTORY} \
|
|
..
|
|
```
|
|
|
|
### Build and Run Examples
|
|
```bash
|
|
make -j examples
|
|
```
|
|
Instructions for running each individual examples are under ```example/```
|
|
|
|
## Tests
|
|
```bash
|
|
make -j examples tests
|
|
make test
|
|
```
|
|
|
|
## Build ckProfiler
|
|
```bash
|
|
make -j ckProfiler
|
|
```
|
|
Instructions for running ckProfiler are under ```profiler/```
|
|
|
|
## Install CK
|
|
```bash
|
|
make install
|
|
```
|
|
|
|
## Using CK as pre-built kernel library
|
|
Instructions for using CK as a pre-built kernel library are under ```client_example/```
|
|
|
|
## Caveat
|
|
### Kernel Timing and Verification
|
|
CK's own kernel timer will warn up kernel once, and then run it multiple times
|
|
to get average kernel time. For some kernels that use atomic add, this will cause
|
|
output buffer to be accumulated multiple times, causing verfication failure.
|
|
To work around it, do not use CK's own timer and do verification at the same time.
|
|
CK's own timer and verification in each example and ckProfiler can be enabled or
|
|
disabled from command line.
|