Commit Graph

606 Commits

Author SHA1 Message Date
myamlak
f497e2ba44 Enabling bf16 test 2022-05-19 18:58:39 +00:00
myamlak
f63ca8e881 Format 2022-05-19 13:42:31 +00:00
myamlak
a7676df989 Merge remote-tracking branch 'origin/develop' into myamlak/cgemm 2022-05-19 12:43:59 +00:00
rocking5566
aafc3ac27a elementwise op (#238)
* Add elementwise operation kernel and example

* Add comment

* Add template argument of dim . Prepare to support multiple dimension

* Rename example

* Support 1 dimension

* Add static assert

* Add comment

* Extract pad

* Remove redundant argument

* Support any dimension for elementwise operation

* Remove line

* Let it be the multiple number of CU

* Move thread per block to the parameter of constructor

* rename threadPerBlock with blockSize

* Support double

* rename kernel function name

* remove redundant include header

* Refine type

* Need to the final dimension

* Refine variable name

* Refine type

* Use index_t instead of int in API

Co-authored-by: rocking <chunylai@amd.com>
2022-05-18 23:34:35 -05:00
myamlak
6ebcb667ba Fix + cosmetics + bf16 test commented out temporarily 2022-05-18 13:09:21 +00:00
myamlak
208ac1a546 Consuming binary ops to do A+B / A-B 2022-05-18 11:29:02 +00:00
myamlak
5e1047429f Merge remote-tracking branch 'origin/eltwise_op' into myamlak/cgemm 2022-05-18 07:33:21 +00:00
rocking
c4d610be4f Move thread per block to the parameter of constructor 2022-05-18 09:01:44 +08:00
rocking
83f7531364 Let it be the multiple number of CU 2022-05-18 08:38:30 +08:00
rocking
b7a82d292d Remove line 2022-05-18 05:04:49 +08:00
rocking
7d44e782af Support any dimension for elementwise operation 2022-05-18 04:56:10 +08:00
rocking
06e52d902a Remove redundant argument 2022-05-18 04:39:25 +08:00
rocking
0f84025680 Extract pad 2022-05-18 04:00:37 +08:00
myamlak
5ae304df0a Second auxiliary buffer added 2022-05-17 13:51:05 +00:00
rocking
ecdfe96092 Add comment 2022-05-17 20:56:16 +08:00
rocking
492da45969 Add static assert 2022-05-17 20:53:45 +08:00
rocking
4af77e1f0e Support 1 dimension 2022-05-17 20:50:03 +08:00
rocking
0d26477a86 Rename example 2022-05-17 20:36:05 +08:00
rocking
b456d5e53e Add template argument of dim . Prepare to support multiple dimension 2022-05-17 20:35:26 +08:00
myamlak
b3767dbe9b Merge remote-tracking branch 'origin/eltwise_op' into myamlak/cgemm 2022-05-17 11:40:31 +00:00
myamlak
e00a943e34 Merge remote-tracking branch 'origin/develop' into myamlak/cgemm 2022-05-17 10:23:36 +00:00
rocking
c2626122af Add comment 2022-05-17 08:24:12 +08:00
rocking
a61f34f70f Add elementwise operation kernel and example 2022-05-17 04:36:47 +08:00
myamlak
ffe12e2ea6 Cosmetics 2022-05-16 11:18:28 +00:00
myamlak
55927aaf7e Example added 2022-05-16 10:38:07 +00:00
myamlak
674f74ad5f Test fixes. 2022-05-16 08:11:22 +00:00
Anthony Chang
9f71ff48e2 Validate examples in CI (#233)
* validate examples in ctest runs

* format

* fix usage of check_err

* amend

* add example codes to custom target 'check'

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-05-13 16:54:44 -05:00
myamlak
14bd1430ed Sketch of tests 2022-05-13 13:55:33 +00:00
myamlak
6a0883fae3 Library instances 2022-05-13 12:44:34 +00:00
myamlak
b1c9458b9e Incomplete simple implementation 2022-05-13 09:12:30 +00:00
JD
cec69bc3bc Add host API (#220)
* Add host API

* manually rebase on develop

* clean

* manually rebase on develop

* exclude tests from all target

* address review comments

* update client app name

* fix missing lib name

* clang-format update

* refactor

* refactor

* refactor

* refactor

* refactor

* fix test issue

* refactor

* refactor

* refactor

* upate cmake and readme

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-05-12 09:21:01 -05:00
ltqin
0f912e205e enable convnd bwd data test (#234) 2022-05-12 09:18:59 -05:00
myamlak
4d07aa1249 Format. 2022-05-12 09:38:59 +00:00
myamlak
0c2d00df8a Reference CGEMM + test stub 2022-05-11 08:25:01 +00:00
Anthony Chang
76764d8c92 Manual control of MAC cluster for improved interwave performance (#184)
* manual control of MAC cluster for improved 2-wave performance

ensure setprio's order; ensure inner loop size >= local read size

synchronize when single mac cluster

* format

* use value field from ck::integral_constant

* roll out inter-wave loop scheduler to c-shuffle gemm variants

will gradually roll out to other applicable device ops when occasional reg spill is resolved

* additional comments

* format

* fix mismatch between inter-wave pipeline and interwave blockwise gemm

* address review feedback

* amend
2022-05-10 19:19:22 -05:00
Adam Osewski
712e464c4e Post PR183 review fixes. (#224)
* Suppress additional warnings for googltest.

* Rename file conv_fwd_util to conv_util.

* Update includes and ConvParams member access.

* Formatting.

* Change conv_fwd_util target to conv_util

* Fix compiler errors.

* Fix leftovers.

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-05-10 15:41:29 -05:00
myamlak
f03a1738d9 Resolution of issue #153: Add compiler warning on comparing int and size_t (#212)
* Turning compare warnings on

* Cleaning part I

* Cleaning part II

* Explicit static_cast to ck::type_convert

* Resolving large tensor size issue.

* format

* revert change to tensor descriptor; promote lementSpaceSize to 64bit

* use integer value for GEMM test

* Review remarks

* Review remarks + issues with (un)signed arithmetic

* Format fix

* Format

* Clang-format.

* fix 2gb limit issue

Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Adam Osewski <aosewski@amd.com>
2022-05-09 15:06:49 -05:00
Wen-Heng (Jack) Chung
968bd93285 Update README.md (#228) 2022-05-09 15:00:04 -05:00
Chao Liu
ec7c2e912e Code refactor (#175)
* format

* improving pipeline

* fix typo

* format

* adding thread group

* adding thread group

* adding thread group

* adding gemm pipeline

* tweak

* refactor

* refactor

* add missing type convert

* refactor

* refactor

* refactor

* clean

* fix build

* refactor

* format

* clean up

* use remove_cvref_t

* clean

* clean up

* clean up

* clean up
2022-05-09 14:57:59 -05:00
Illia Silin
a3c910ac6c Add Benchmark test into CI (#226)
* add performance test to jenkins pipeline

* fix typo

* fix the syntax in conv_fwd_util.cpp

* fix the error message syntax spacing

* fix the error message syntax spacing again

* run profile_gemm and archive results

* fix typo

* try to figure out the paths

* try to figure out the paths one more time

* skip the copying step

* build ckProfiler release only once

* change directory using dir

* fix dir syntax

* change the gemm parameters

* do not pipe script output to file

* try running ckProfiler directly

* fix typo

* use set +e

* run profile_gemm.sh || true

* run multiple gemms and parse results

* fix typo in jenkinsfile

* fix syntax

* add new gemm sizes, update scripts

* put all jenkins steps in original order

Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>
2022-05-08 02:44:18 -05:00
Adam Osewski
8eca05a633 Introduce GoogleTest framework. (#204)
* Use googletest for tests. Add conv2d_fwd UT.

* Add conv1D/3D to gtest UT.

* Fix: not duplicate test with CTest.

* Convert more tests to googltests.

* Fix: GIT_SHALLOW is not allowed for git commit hash.

* Clang-format

* use integer value for GEMM test

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>
2022-04-30 08:50:16 -05:00
Chao Liu
8a2c69eeee use integer value for GEMM test (#219) 2022-04-30 08:44:20 -05:00
Qianfeng
c77ae65d40 Update to gemm_reduce and batched_gemm_reduce (#213)
* [Experimental] Change to gemm+reduce and batched-gemm+reduce

* Use threadwise-reduce function to improve the gridwise_gemm_reduce_xdl_cshuffle kernel

* Tiny fix in device_batched_gemm_xdl.hpp

* clang-format library/src/utility/conv_fwd_util.cpp
2022-04-29 11:35:25 -05:00
JD
97d8c5045e Add gfx90a CI stage for tests (#208)
* Add gfx90a CI stage

* upgrade to ROCm 5.1 and fix formatting
2022-04-29 10:36:19 -05:00
Anthony Chang
95e93430de Hotfix for gemm test (#214)
* pass by ref to avoid throwing away initialization results

* EOL CRLF -> LF
2022-04-29 19:03:34 +08:00
Jianfeng Yan
3956085d8e add comments to batched_gemm (#186)
* add comments to batched_gemm

* formatting

* fix a typo in batched_gemm_documentation

* fix naming
2022-04-25 14:32:59 -05:00
Anthony Chang
7c0b149811 profiler: fix fp32 c-shuffle gemm tuning parameter (#194) 2022-04-22 15:48:51 -05:00
Adam Osewski
31d869adc6 Clang-format only modified files. (#181) 2022-04-22 15:48:08 -05:00
Anthony Chang
08a979f188 use inline asm for 4x4 int8 transposition (#187) 2022-04-22 15:47:31 -05:00
Adam Osewski
1a0cd5d160 Convolution FWD profiler refactor. (#183)
* Convolution ND

* Code unification across dimensions for generating tensor descriptors.
* Example
* Instances

* Move convnd f32 instance file to comply with repo structure.

* Conv 1D tensor layouts.

* Formatting and use ReferenceConv

* Reference ConvFwd supporting 1D and 2D convolution.

* Debug printing TensorLayout name.

* Conv fwd 1D instance f32

* Refactor conv ND example.

Needed to support various conv dimensio.

Needed to support various conv dimensions

* Rename conv nd example director to prevent conflicts.

* Refactor some common utility to single file.

Plus some tests.

* Refactor GetHostTensorDescriptor + UT.

* Add 1D test case.

* Test reference convolution 1d/2d

* Remove some leftovers.

* Fix convolution example error for 1D

* Refactor test check errors utility function.

* Test Conv2D Fwd XDL

* More UT for 1D case.

* Parameterize input & weight initializers.

* Rename example to prevent conflicts.

* Split convnd instance into separate files for 1d/2d

* Address review comments.

* Fix data type for flops/gbytes calculations.

* Assign example number 11.

* 3D cases for convolution utility functions.

* 3D reference convolution.

* Add support for 3D convolution.

* Check for inputs bigger than  2GB.

* Formatting

* Support for bf16/f16/f32/i8 - conv instances + UT.

* Use check_err from test_util.hpp.

* Split convnd test into separate files for each dim.

* Fix data generation and use proper instances.

* Formatting

* Skip tensor initialization if not necessary.

* Fix CMakefiles.

* Remove redundant conv2d_fwd test.

* Lower problem size for conv3D UT.

* 3D case for convnd example.

* Remove leftovers after merge.

* Add Conv Specialization string to GetTypeString

* Skip instance causing numerical errors.

* Small fixes.

* Remove redundant includes.

* Fix namespace name error.

* Script for automatic testing and logging convolution fwd UTs

* Comment out numactl cmd.

* Refine weights initalization and relax rtol for fp16

* Move test_util.hpp to check_err.hpp

* Refine weights initalization and relax rtol for fp16

* Refactor common part of test conv utils.

* Move utility function to single common place.

* Add additional common functions to utility.

* Refactor convnd_fwd_xdl examples.

* Remove redundant files.
* Unify structure.

* Add constructor to ConvParams.

* And add input parameters validation.

* Modify conv examples to use single utility file.

* Remove check_error from host_tensor.hpp

* Get rid of check_indices function.

* Remove bf16_to_f32 function overload for scalars.

* Fix namespace.

* Add half_float::half for check_err.

* Fix conv params size in UT.

* Fix weights initialization for int8.

* Fix weights initialization for int8.

* Add type_convert when store output in ref conv 1D.

* Get back old conv2d_fwd_xdl operation.

* Silence conv debug print.

* format

* clean

* clean

* Fix merge.

* Fix namespace for check_err

* Formatting.

* Fix merge artifacts.

* Remove deleted header.

* Fix some includes and use ck::utils::check_err.

* Remove unused check_indices restored by previous merge.

* Fix namespaces after merge.

* Fix compilation error.

* Small fixes.

* Use common functions.
* Fix filename
* Fix namespaces.

* Fix merge artifact - retrieve removed by accident fun.

* Fix ConvForwardSpecialization.

* Working example of OpInstanceRunEngine for conv2dfwd UT.

* Adhere to coding style rules.

* Formatting and adhere to coding style rules.

* Fix merge artifacts.

* Utility for collecting conv fwd instances.

+ Plus commmon part for parsing cmdline params.

* Refactor FillUniform because of segfault for int8_t.

* Naming convention.

* Elegant version of device mem allocation.

* Use OpInstanceRunEngine in conv fwd nd tests.

* Multiple refinements.

* conditional init
* don't run reference op if not provided.

* Use OpInstanceRunEngine for ckProfiler conv_fwd

* Refactor common tensor fill function to separate file.

* Clean up unused functions.

* Support different init methods.

* Create CMake target for conv_fwd_util.

* Add header for profile_convnd_fwd.cpp

* Fix CMakefiles to link with conv_fwd_util where needed.

* Fix some clutter.

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-04-21 17:39:39 -05:00