Commit Graph

6 Commits

Author SHA1 Message Date
ltqin
c254e5abd2 NHWC conv 2d: bwd fp32/fp16/bfp16/int8, Device level tuning and host API (#92)
* start conv2d bwd api

* kernel running

* add bwd reference

* change to no shuffle

* fix bwd reference

* pass verification

* add Filter1x1Stride1Pad0 and start testing

* change some tuning parameter

* fix test error

* add fp16 tuning parameter

* add bf16 tuning parameter

* add int8 tuning parameters

* change fp32 tuning parameter

* add bwd to profiler

* fix bug for bwd profiler

* fix ckProfiler bug

* change conv2d_bwd_xdl to fp16

* fix bug in comments

* fix precompile id

* fix enum conv name

* chage _bwd_ to _bwd_data_

* change conv2d_bwd example id

* bwd to bwd data

* fix prehead

* fix MakeDefaultBlock2CTileMap ,import form merge develop

* format bwd instance

* bwd to bwd data

* change name bwd to bwd data

* change name bwd to bwd data in example

* formate code

* change conv2d bwd data id in example

* rewrite readme for example

* fix CalculateMagicNumbers about div zero

* add workaround CK_WORKAROUND_SWDEV_325164

* change test_conf2d_bwd_data show info

* format

* fix bug for workaround:CK_WORKAROUND_SWDEV_325164

* formate tuning parameters

* formate tuning parameters again

* formate tuning parameters 3

* formate tuning parameters 4

* remove add function template

* format

* update comment

Co-authored-by: ltqin <letaoqin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-03-04 00:08:26 -06:00
Adam Osewski
756a761727 Unify Convolution FWD XDL 1D/2D implementation. (#93)
* Convolution ND

* Code unification across dimensions for generating tensor descriptors.
* Example
* Instances

* Move convnd f32 instance file to comply with repo structure.

* Conv 1D tensor layouts.

* Formatting and use ReferenceConv

* Reference ConvFwd supporting 1D and 2D convolution.

* Debug printing TensorLayout name.

* Conv fwd 1D instance f32

* Refactor conv ND example.

Needed to support various conv dimensio.

Needed to support various conv dimensions

* Rename conv nd example director to prevent conflicts.

* Refactor some common utility to single file.

Plus some tests.

* Refactor GetHostTensorDescriptor + UT.

* Add 1D test case.

* Test reference convolution 1d/2d

* Remove some leftovers.

* Fix convolution example error for 1D

* Refactor test check errors utility function.

* Test Conv2D Fwd XDL

* More UT for 1D case.

* Parameterize input & weight initializers.

* Rename example to prevent conflicts.

* Split convnd instance into separate files for 1d/2d

* Address review comments.

* Fix data type for flops/gbytes calculations.

* Assign example number 11.

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-02-23 10:44:20 -06:00
rocking5566
19c5d6e651 Gemm alpha beta profiler (fp32 & fp16) (#91)
* [What] Refactor verification of gemm alpha_beta, move to reference operation
[Why] Sync with other verification

* Profile mk_nk for gemm bias 2d

* Support bias 2d with mn * kn in profiler

* Support bias 2d with km*kn and km*nk in profiler

* Support fp32 bias 2d in profiler

* format

* format

Co-authored-by: rocking <chunylai@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-02-21 11:35:21 -06:00
ltqin
880fbee957 NHWC conv 2d: fwd bfp16/int8, Device level tuning and host API (#73)
* add fwd bf16 conv

* change tunning parametor

* add int8 for conv fwd

* remove comments

* change tunning parametor for int8

* change init int8 example

* add test for conv2d fwd

* change device operation file pos because merge develop

* fwd int8 use reference

* test_conv_fwd use reference

* add braket for if statement

* rename fwd example name

* remove StaticBufferOfVectorTypeV2

* tweak example

Co-authored-by: ltqin <letaoqin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-02-11 20:06:40 -06:00
zjing14
b53e9d08ed Batched GEMM for fp16 (#79)
* prepare host for batched_gemm

* init commit of batched kernels

* fixed

* refine transform with freeze

* m/n padding

* fixed a bug; clean

* add small tiles

* clean

* clean code

* clean code

* add nt, tn, tt layout

* add missing file

* use StaticBufferTupleOfVector instead

* add reference_batched_gemm

* fixed a macro
2022-02-11 09:36:52 -06:00
Chao Liu
823657ed12 GEMM+Bias+ReLU+Add (#76)
* tweak conv for odd C

* update script

* clean up elementwise op

* fix build

* clean up

* added example for gemm+bias+relu+add

* added example for gemm+bias+relu

* add profiler for gemm_s_shuffle; re-org files

* add profiler

* fix build

* clean up

* clean up

* clean up

* fix build
2022-02-06 22:32:47 -06:00