Files
composable_kernel/test/ck_tile/memory_copy
Haocong WANG a5fdc663c8 fix async copytest bug (#2509)
* fix async copytest bug

* Add block_sync_lds_direct_load utility

* fix the s_waitcnt_imm calculation

* Improve s_waitcnt_imm calculation

* fix vmcnt shift

* add input validation and bug fix

* remove unnecessary output

* move test_copy into test

* change bit width check

* refactor macros into constexpr functions

which still get inlined

* wrap s_waitcnt api

* parameterize test

* cleanup

* cleanup fp8 stub

* add fp8 test cases; todo which input parameters are valid?

* replace n for fp8 in test cases

* add large shapes; fp8 fails again

* change input init

* test sync/async

* time the test

* clang-format test

* use float instead of bfloat to cover a 4-byte type

* fix logic - arg sections should be 'or'd

* make block_sync_lds_direct_load interface similar to old ck

* fix a few comment typos

* name common shapes

* revert the example to original logic of not waiting lds

* clang-format

---------

Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-07-23 00:14:02 -07:00
..
2025-07-23 00:14:02 -07:00
2025-07-23 00:14:02 -07:00
2025-07-23 00:14:02 -07:00
2025-07-23 00:14:02 -07:00

Copy Kernel

This folder contains basic setup code designed to provide a platform for novice CK_Tile kernel developers to test basic functionality with minimal additional code compared to the functional code. Sample functional code for a simple tile distribution for DRAM window and LDS window are provided and data is moved from DRAM to registers, registers to LDS, LDS to registers and finally data is moved to output DRAM window for a simple copy operation.

build

# in the root of ck_tile
mkdir build && cd build
# you can replace <arch> with the appropriate architecture 
# (for example gfx90a or gfx942) or leave it blank
sh ../script/cmake-ck-dev.sh  ../ <arch>
# Make the copy kernel executable
make test_copy -j

This will result in an executable build/bin/test_copy_kernel

example

args:
          -m        input matrix rows. (default 64)
          -n        input matrix cols. (default 8)
          -id       warp to use for computation. (default 0)
          -v        validation flag to check device results. (default 1)
          -prec     datatype precision to use. (default fp16)
          -warmup   no. of warmup iterations. (default 50)
          -repeat   no. of iterations for kernel execution time. (default 100)