* chore(copyright): update copyright header for test directory
* chore(copyright): update copyright header for test directory
* chore(copyright): update copyright header for client_example directory
* chore(copyright): update copyright header for test directory
* fix async copytest bug
* Add block_sync_lds_direct_load utility
* fix the s_waitcnt_imm calculation
* Improve s_waitcnt_imm calculation
* fix vmcnt shift
* add input validation and bug fix
* remove unnecessary output
* move test_copy into test
* change bit width check
* refactor macros into constexpr functions
which still get inlined
* wrap s_waitcnt api
* parameterize test
* cleanup
* cleanup fp8 stub
* add fp8 test cases; todo which input parameters are valid?
* replace n for fp8 in test cases
* add large shapes; fp8 fails again
* change input init
* test sync/async
* time the test
* clang-format test
* use float instead of bfloat to cover a 4-byte type
* fix logic - arg sections should be 'or'd
* make block_sync_lds_direct_load interface similar to old ck
* fix a few comment typos
* name common shapes
* revert the example to original logic of not waiting lds
* clang-format
---------
Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>