mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-05 14:11:29 +00:00
* convnd_fwd fp16 example * update example * update example * update instance * updating refernce conv * update reference conv * update conv fwd profiler * update conv 1d and 3d instance * update include path * clean * update profiler for conv bwd data and weight * update conv bwd weight * clean * update conv example * update profiler for conv bwd weight * update ckprofiler for conv bwd data * fix reference conv bwd data bug; update conv bwd data test * update examples * fix initialization issue * update test for conv fwd * clean * clean * remove test case too sensitive to error threshhold * fix test * clean * fix build * adding conv multiple d * adding conv multiple D * add matrix padder * add gemm padding to convnd * adding group conv * update gemm multi-d * refactor * refactor * refactor * clean * clean * refactor * refactor * reorg * add ds * add bias * clean * add G * adding group * adding group * adding group * update Tensor * clean * update example * update DeviceGemmMultipleD_Xdl_CShuffle * update conv bwd-data and bwd-weight * upate contraction example * update gemm and batch gemm with e permute * fix example build * instance for grouped conv1d * update example * adding group conv instance * update gemm bilinear instance * update gemm+add+add+fastgelu instance * update profiler * update profiler * update test * update test and client example * clean * add grouped conv into profiler * update profiler * clean * add test grouped conv, update all conv test to gtest * update test
67 lines
2.0 KiB
Markdown
67 lines
2.0 KiB
Markdown
## Docker script
|
|
```bash
|
|
docker run \
|
|
-it \
|
|
--privileged \
|
|
--group-add sudo \
|
|
-w /root/workspace \
|
|
-v ${PATH_TO_LOCAL_WORKSPACE}:/root/workspace \
|
|
rocm/tensorflow:rocm5.1-tf2.6-dev \
|
|
/bin/bash
|
|
```
|
|
|
|
# Install newer version of rocm-cmake
|
|
https://github.com/RadeonOpenCompute/rocm-cmake
|
|
|
|
## Build
|
|
```bash
|
|
mkdir build && cd build
|
|
```
|
|
|
|
```bash
|
|
# Need to specify target ID, example below is gfx908 and gfx90a
|
|
cmake \
|
|
-D BUILD_DEV=OFF \
|
|
-D CMAKE_BUILD_TYPE=Release \
|
|
-D CMAKE_CXX_FLAGS=" --offload-arch=gfx908 --offload-arch=gfx90a -O3" \
|
|
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
|
|
-D CMAKE_PREFIX_PATH=/opt/rocm \
|
|
-D CMAKE_INSTALL_PREFIX=${PATH_TO_CK_INSTALL_DIRECTORY} \
|
|
..
|
|
```
|
|
|
|
### Build and Run Examples
|
|
```bash
|
|
make -j examples
|
|
```
|
|
Instructions for running each individual examples are under ```example/```
|
|
|
|
## Tests
|
|
```bash
|
|
make -j examples tests
|
|
make test
|
|
```
|
|
|
|
## Build ckProfiler
|
|
```bash
|
|
make -j ckProfiler
|
|
```
|
|
Instructions for running ckProfiler are under ```profiler/```
|
|
|
|
## Install CK
|
|
```bash
|
|
make install
|
|
```
|
|
|
|
## Using CK as pre-built kernel library
|
|
Instructions for using CK as a pre-built kernel library are under ```client_example/```
|
|
|
|
## Caveat
|
|
### Kernel Timing and Verification
|
|
CK's own kernel timer will warn up kernel once, and then run it multiple times
|
|
to get average kernel time. For some kernels that use atomic add, this will cause
|
|
output buffer to be accumulated multiple times, causing verfication failure.
|
|
To work around it, do not use CK's own timer and do verification at the same time.
|
|
CK's own timer and verification in each example and ckProfiler can be enabled or
|
|
disabled from command line.
|