mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-04 13:41:24 +00:00
fix async copytest bug (#2509)
* fix async copytest bug * Add block_sync_lds_direct_load utility * fix the s_waitcnt_imm calculation * Improve s_waitcnt_imm calculation * fix vmcnt shift * add input validation and bug fix * remove unnecessary output * move test_copy into test * change bit width check * refactor macros into constexpr functions which still get inlined * wrap s_waitcnt api * parameterize test * cleanup * cleanup fp8 stub * add fp8 test cases; todo which input parameters are valid? * replace n for fp8 in test cases * add large shapes; fp8 fails again * change input init * test sync/async * time the test * clang-format test * use float instead of bfloat to cover a 4-byte type * fix logic - arg sections should be 'or'd * make block_sync_lds_direct_load interface similar to old ck * fix a few comment typos * name common shapes * revert the example to original logic of not waiting lds * clang-format --------- Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com> Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
This commit is contained in:
31
test/ck_tile/memory_copy/README.md
Normal file
31
test/ck_tile/memory_copy/README.md
Normal file
@@ -0,0 +1,31 @@
|
||||
# Copy Kernel
|
||||
This folder contains basic setup code designed to provide a platform for novice
|
||||
CK_Tile kernel developers to test basic functionality with minimal additional
|
||||
code compared to the functional code. Sample functional code for a simple
|
||||
tile distribution for DRAM window and LDS window are provided and data is moved
|
||||
from DRAM to registers, registers to LDS, LDS to registers and finally data
|
||||
is moved to output DRAM window for a simple copy operation.
|
||||
|
||||
## build
|
||||
```
|
||||
# in the root of ck_tile
|
||||
mkdir build && cd build
|
||||
# you can replace <arch> with the appropriate architecture
|
||||
# (for example gfx90a or gfx942) or leave it blank
|
||||
sh ../script/cmake-ck-dev.sh ../ <arch>
|
||||
# Make the copy kernel executable
|
||||
make test_copy -j
|
||||
```
|
||||
This will result in an executable `build/bin/test_copy_kernel`
|
||||
|
||||
## example
|
||||
```
|
||||
args:
|
||||
-m input matrix rows. (default 64)
|
||||
-n input matrix cols. (default 8)
|
||||
-id warp to use for computation. (default 0)
|
||||
-v validation flag to check device results. (default 1)
|
||||
-prec datatype precision to use. (default fp16)
|
||||
-warmup no. of warmup iterations. (default 50)
|
||||
-repeat no. of iterations for kernel execution time. (default 100)
|
||||
```
|
||||
Reference in New Issue
Block a user