Commit Graph

8 Commits

Author SHA1 Message Date
Chao Liu
5b178874a1 Fix Tests build (#109)
* fix tests

* remove useless file

* fix test build

* reduce parallelism when compiling

* fix test
2022-03-05 00:44:11 -06:00
rocking5566
7e9a9d32c7 [Bf16 & int8] [example & ckprofiler] (#100)
* Add int8 of mk_nk_mn to the ckProfiler

* Add example of int8 gemm

* Fix typo, use ushort instead of half_t for bfloat16

* replace ushortXXX_t to bhalfXXX_t

* rename ushort to bhalf_t

* Add bf16 example

* Add bf16 gemm to ckProfiler

* Fix alignment

* Fix typo

* Add unit test for gemm_xdl int8

* Add gemm_xdl fp32 unit test

* Add gemm_xdl bf16 unit test

* fix build

* fix build issue due to merge conflict

* Fix build

* Fix build error

Co-authored-by: rocking <chunylai@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-03-04 15:56:44 -06:00
JD
992f71e371 Update test CMakeLists to add new tests automatically and add Jenkins stage for tests (#88)
* add docker file and make default target buildable

* add Jenkinsfile

* remove empty env block

* fix package stage

* remove render group from docker run

* clean up Jenkins file

* add cppcheck as dev dependency

* update cmake file

* Add profiler build stage

* add hip_version config file for reduction operator

* correct jenkins var name

* Build release instead of debug

* Update test CMakeLists.txt
reorg test dir
add test stage

* reduce compile threads to prevent compiler crash

* add optional debug stage, update second test

* remove old test target

* fix tests to return proper results and self review

* Fix package name and make test run without args

* change Dockerfile to ues rocm4.3.1

* remove parallelism from build

* Lower paralellism

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-03-03 16:59:42 -06:00
Jianfeng Yan
bdedf64b98 Space filling curve (#96)
* add space_filling_curve

* cleanup and move space_filling_curve into test

* add functions for backward and forward step; hard coded results in unit test

* minor changes
2022-02-24 20:11:36 -06:00
Adam Osewski
756a761727 Unify Convolution FWD XDL 1D/2D implementation. (#93)
* Convolution ND

* Code unification across dimensions for generating tensor descriptors.
* Example
* Instances

* Move convnd f32 instance file to comply with repo structure.

* Conv 1D tensor layouts.

* Formatting and use ReferenceConv

* Reference ConvFwd supporting 1D and 2D convolution.

* Debug printing TensorLayout name.

* Conv fwd 1D instance f32

* Refactor conv ND example.

Needed to support various conv dimensio.

Needed to support various conv dimensions

* Rename conv nd example director to prevent conflicts.

* Refactor some common utility to single file.

Plus some tests.

* Refactor GetHostTensorDescriptor + UT.

* Add 1D test case.

* Test reference convolution 1d/2d

* Remove some leftovers.

* Fix convolution example error for 1D

* Refactor test check errors utility function.

* Test Conv2D Fwd XDL

* More UT for 1D case.

* Parameterize input & weight initializers.

* Rename example to prevent conflicts.

* Split convnd instance into separate files for 1d/2d

* Address review comments.

* Fix data type for flops/gbytes calculations.

* Assign example number 11.

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-02-23 10:44:20 -06:00
ltqin
880fbee957 NHWC conv 2d: fwd bfp16/int8, Device level tuning and host API (#73)
* add fwd bf16 conv

* change tunning parametor

* add int8 for conv fwd

* remove comments

* change tunning parametor for int8

* change init int8 example

* add test for conv2d fwd

* change device operation file pos because merge develop

* fwd int8 use reference

* test_conv_fwd use reference

* add braket for if statement

* rename fwd example name

* remove StaticBufferOfVectorTypeV2

* tweak example

Co-authored-by: ltqin <letaoqin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-02-11 20:06:40 -06:00
ltqin
4be7f0198e add split-k GEMM (#59)
* add DeviceGemmSplitKXdl

* add file device_gemm_splitk_xdl.hpp

* set c matrix zero

* using atomic

* add all tuning parameter to f32 mkkn

* grid size change to 720

* add tunning parameter for NT

* add tunning parameter for TN

* add tunning parameter for TT

* add m=96tunning parameter

* add lost config

* add element wise operation

* fixed MPerBlock=96

* remove marco for slpitk swtich

* add test

* add new line at the end of device_gemm_xdl_instance.hpp

* remove step hack

* seperate split-k instance files

* add tunning parameters

* change disired grid size to parameters

* remove slice length

* add desiredgridsize parameter to ckProfiler

* add losting file device_gemm_xdl_splitk_instance.hpp

* change desired gride size to kbatch

* format

* format

* clean up

* add selection of device_instances

* clean code

* fix build issue

Co-authored-by: ltqin <letaoqin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Jing Zhang <jizhan@amd.com>
2022-02-02 22:47:27 -06:00
Chao Liu
237d4ca03f added test for magic number division (#58) 2021-11-30 09:09:28 -06:00