Commit Graph

7 Commits

Author SHA1 Message Date
Anthony Chang
6d4450ef15 Allow distinct K0/K1 values for A/B block descriptor (#98)
* add gitignore

* host tensor: allow generating sequentially increasing value in a given dimension

* gridwise gemm v3r1: allow distinct K0/K1 values for A/B block descriptor

- remove dangling header include
- modify example gemm_xdl accordingly
- infer KPack value from M/NPerXdl
- device conv2d fwd: update parameters accordingly for the underlying gridwise gemm v3r1
(API for conv2d fwd stays the same for now until we decide to expose individual K0s for activation and weight)

* add LDS data dump utility

* profiler: reflect API change for distinct K0/K1 for A/B matrices

* profiler: add conflict-free LDS write FP16 kernel instances

* fix accidental perf regression

* address feedback; cosmetic changes

* clang-format for new files

* format

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-02-27 21:06:18 -06:00
Chao Liu
22d438ae9e Add gridwise GEMM pipeline (#89)
* clean up

* add mutilple thread scratch to ThreadwiseTensorSliceTransfer_v3r1

* add 2 stage prefetch

* add more sanity check into transform_tensor_descriptor

* tweak

* enabling 2 stage prefetch to exsiting gridwise gemm; tweak

* enabling 2 stage prefetch to exsiting gridwise gemm

* move gridwise gemm pipeline in class; clean up

* add some irregular tile size

* update CalculateHasMainK0BlockLoop for multi-stage-prefetch

* refactor gridwise gemm pipeline class
2022-02-23 17:23:49 -06:00
Jianfeng Yan
6dfb92bbef Conv3d new (#94)
* conv3d compiles but has memory error

* conv3d works

* fix performance issue by using __builtin_amdgc_readfirstlane

* change MakeBlock2CTileMap to MakeDefaultBlock2CTileMap; change c_blockid_to* to cblockid_to*

* clang-format

* remove CK_EXPERIMENTAL_PASS_TENSOR_DECRIPTOR_BY_*; moved wrapper into DeviceConv3d

* format

* remove useless marc

* add comment

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-02-22 22:45:28 -06:00
Anthony Chang
904cbe2a8f fix build breaks (#81)
- device_gemm_xdl_c_shuffle function signature matches split-k
- retire host_driver since it is no longer maintained
- linter error (unused variable)

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-02-10 23:52:19 -06:00
Chao Liu
823657ed12 GEMM+Bias+ReLU+Add (#76)
* tweak conv for odd C

* update script

* clean up elementwise op

* fix build

* clean up

* added example for gemm+bias+relu+add

* added example for gemm+bias+relu

* add profiler for gemm_s_shuffle; re-org files

* add profiler

* fix build

* clean up

* clean up

* clean up

* fix build
2022-02-06 22:32:47 -06:00
rocking5566
ca47a6cfe2 Do not hardcode the function parameter, use template instead. (#72)
* Do not hardcode the function parameter, use template instead.

* [What] Remove AThreadTransferSrcResetCoordinateAfterRun and BThreadTransferSrcResetCoordinateAfterRun in host API
[Why] "C_Shuffle" version is supposed to be similar to the vanilla one

* Fix typo
Let DeviceGemmXdl_C_Shuffle use kernel_gemm_xdlops_v3r1
2022-01-24 22:44:13 -06:00
rocking5566
4d40b1974e Add gemm_shuffle host api (#71)
* [What]
1. Add DeviceGemmXdl_C_Shuffle
2. Revise example of gemm_xdl
[Why] Prepare to add shuffle version of D = alpha * (A * B) + beta * C
[How] Imitate DeviceGemmXdl and device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk.hpp
2022-01-21 00:31:17 -06:00