mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-17 09:08:35 +00:00

Go to file

Po Yen Chen 313e570496 Add examples of Conv + reduction (data type: int4, int8, bf16, fp16, fp32) (#380 )

* Refactor the design of DeviceGemmMultipleDMultipleR_Xdl_CShuffle

* Add 'DeviceGroupedConvFwdMultipleDMultipleR' interface

* Add DeviceGroupedConvFwdMultipleDMultipleR_Xdl_CShuffle

* Remove 'GridwiseConvFwdMultipleDMultipleR_xdl_cshuffle'

* Add 'TransformConvFwdToGemm<>' utility class (from Chao)

* Use 'TransformConvFwdToGemm<>' to shorten code

* Fix ill-formed method declaration

* Re-implement MakeRGridDescriptor_M() function

* Change problem description

* Use macro to define layout types

* Define K-reduced output tensor layout types

* Let user to decide R output tensor layout

* Rename variables

* Add padding to the reduced output tensor if necessary

* Extract common code as helper method

* Remove debug message

* Add missing include directive

* Add partial fp16 Conv + Reduction example

* Add example verification code for 2D Conv problem

* Use type alias to simplify code

* Share code across different-dimension Conv problems

* Rename file/functions from run_conv_fwd* to run_convnd_fwd*

* Make example code more verbose

* Add code to support 1D & 3D Conv + Reduction on host

* Add more examples for data type: bf16, fp32

* Add example for int8

* Add custom target to group examples

* Use more general custom target name

* Change the description in error message

* Disable testing for example other than fp32

* Add examplel for int4 (just copy from int8)

* Fix wrong data type

* Use larger data type for intermediate tensors

* Finish int4 example

* Undefine macro PP_DEFINE_LAYOUT_TYPE() after use

* Use named variables to replace magic numbers

* Remove debug messages

* Use same A/B data type for host Conv in int4 example

* Add check for the 'RLayout' type argument

* Group same-dim-layouts together in 'LayoutSetting<>'

* Add 'final' specifier to utility classes

* Use different initialization method for examples

* Remove macro PP_DEFINE_LAYOUT_TYPE()

* Fix code-comment mismatch

* Use more reasonable initialization value for all data types

* Default use init_method=1 for all examples

* Remove never-used code

* Remove confusing out-of-date comments

* clean

Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>

[ROCm/composable_kernel commit: 46a675aa6f]

2022-08-31 16:32:17 -05:00

client_example

layernorm external api (#379 )

2022-08-24 18:43:43 -05:00

cmake

int4 data type (#364 )

2022-08-18 14:53:47 -05:00

example

Add examples of Conv + reduction (data type: int4, int8, bf16, fp16, fp32) (#380 )

2022-08-31 16:32:17 -05:00

include/ck

Add examples of Conv + reduction (data type: int4, int8, bf16, fp16, fp32) (#380 )

2022-08-31 16:32:17 -05:00

library

Add examples of Conv + reduction (data type: int4, int8, bf16, fp16, fp32) (#380 )

2022-08-31 16:32:17 -05:00

profiler

Try to workaround flaky GemmSoftmaxGemm tests (#386 )

2022-08-29 08:40:25 -05:00

script

Fixed splitk gemm fp32 (#384 )

2022-08-26 09:59:50 -05:00

test

More int4 tests. (#374 )

2022-08-25 17:20:23 -05:00

.clang-format

start adding convolution

2018-10-08 22:49:58 -05:00

.clang-tidy

add tidy

2021-08-08 17:41:54 +00:00

.gitignore

layernorm external api (#379 )

2022-08-24 18:43:43 -05:00

CMakeLists.txt

Add an option to build CK with clang directly (#387 )

2022-08-26 12:51:39 -05:00

Config.cmake.in

Add host API (#220 )

2022-05-12 09:21:01 -05:00

dev-requirements.txt

Initial Setup for CI (#86 )

2022-02-18 21:44:11 -06:00

Dockerfile

Fix QA, allow switching compiler versions, fix google test compilation error. (#348 )

2022-08-08 13:49:14 -05:00

Jenkinsfile

Add an option to build CK with clang directly (#387 )

2022-08-26 12:51:39 -05:00

LICENSE

update license (#297 )

2022-06-23 01:27:30 -05:00

rbuild.ini

Update test CMakeLists to add new tests automatically and add Jenkins stage for tests (#88 )

2022-03-03 16:59:42 -06:00

README.md

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

requirements.txt

Update test CMakeLists to add new tests automatically and add Jenkins stage for tests (#88 )

2022-03-03 16:59:42 -06:00

README.md

Docker script

docker run                                     \
-it                                            \
--privileged                                   \
--group-add sudo                               \
-w /root/workspace                             \
-v ${PATH_TO_LOCAL_WORKSPACE}:/root/workspace  \
rocm/tensorflow:rocm5.1-tf2.6-dev              \
/bin/bash

Install newer version of rocm-cmake

https://github.com/RadeonOpenCompute/rocm-cmake

Build

mkdir build && cd build

# Need to specify target ID, example below is gfx908 and gfx90a
cmake                                                                 \
-D BUILD_DEV=OFF                                                      \
-D CMAKE_BUILD_TYPE=Release                                           \
-D CMAKE_CXX_FLAGS=" --offload-arch=gfx908 --offload-arch=gfx90a -O3" \
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc                             \
-D CMAKE_PREFIX_PATH=/opt/rocm                                        \
-D CMAKE_INSTALL_PREFIX=${PATH_TO_CK_INSTALL_DIRECTORY}               \
..

Build and Run Examples

 make -j examples

Instructions for running each individual examples are under example/

Tests

 make -j examples tests
 make test

Build ckProfiler

 make -j ckProfiler

Instructions for running ckProfiler are under profiler/

Install CK

make install

Using CK as pre-built kernel library

Instructions for using CK as a pre-built kernel library are under client_example/

Caveat

Kernel Timing and Verification

CK's own kernel timer will warn up kernel once, and then run it multiple times to get average kernel time. For some kernels that use atomic add, this will cause output buffer to be accumulated multiple times, causing verfication failure. To work around it, do not use CK's own timer and do verification at the same time. CK's own timer and verification in each example and ckProfiler can be enabled or disabled from command line.

Languages

C++ 90.7%

Python 6.6%

CMake 1.7%

Shell 0.5%

Pawn 0.2%

Other 0.1%