Anthony Chang c53c6f3352 Allow distinct K0/K1 values for A/B block descriptor (#98)
* add gitignore

* host tensor: allow generating sequentially increasing value in a given dimension

* gridwise gemm v3r1: allow distinct K0/K1 values for A/B block descriptor

- remove dangling header include
- modify example gemm_xdl accordingly
- infer KPack value from M/NPerXdl
- device conv2d fwd: update parameters accordingly for the underlying gridwise gemm v3r1
(API for conv2d fwd stays the same for now until we decide to expose individual K0s for activation and weight)

* add LDS data dump utility

* profiler: reflect API change for distinct K0/K1 for A/B matrices

* profiler: add conflict-free LDS write FP16 kernel instances

* fix accidental perf regression

* address feedback; cosmetic changes

* clang-format for new files

* format

Co-authored-by: Chao Liu <chao.liu2@amd.com>

[ROCm/composable_kernel commit: 6d4450ef15]
2022-02-27 21:06:18 -06:00
2022-02-18 21:44:11 -06:00
2022-02-25 01:19:37 -06:00
2022-02-06 22:32:47 -06:00
2022-02-24 20:11:36 -06:00
2018-10-08 22:49:58 -05:00
2021-08-08 17:41:54 +00:00
2022-02-18 21:44:11 -06:00
2022-02-18 21:44:11 -06:00
2022-02-18 21:44:11 -06:00
2022-02-18 21:44:11 -06:00
2022-02-18 21:44:11 -06:00
Description
[DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror
Readme MIT 234 MiB
Languages
C++ 93.1%
Python 4.5%
CMake 1.5%
Shell 0.5%
Pawn 0.2%