mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 10:09:41 +00:00
* add transpose load; no real logic
* fix some compile errors
* fix some issues
* update transpose load logic
* add some fixes
* fix a distribution issue
* update some codes
* add some fix
* can pass; but no logic
* transpose load enable
* update tile transpose
* miss output tile distribution mapping
* hack for transpose 16x16
* update output tensor distribution
* delete unused variables
* fix transpose related codes
* update transpose load example
* exchange the iteration order
* fix 16x16 related dimension transpose
* fix a transpose index issue
* fix a transpose index issue
* fix clang format check
* update load tile transpose related codes
* fix compile errors and pass 16x16 tests
* fix a typo
* update logic
* check other data types
* add transpose load api
* update transpose load api
* fix clang format check
* change file name
* refactor codes
* update code name
* delete some unused codes
* delete the unused oob flag for transpose load
* update tensor view api for transpose load
* update for testing
* fix a typo error
* move transpose ops to example directory
* update transpose api
* update include file
* fix for pr review
* fix compile errors
* add transpose load; no real logic
* fix some compile errors
* fix some issues
* update transpose load logic
* add some fixes
* fix a distribution issue
* update some codes
* add some fix
* can pass; but no logic
* transpose load enable
* update tile transpose
* miss output tile distribution mapping
* hack for transpose 16x16
* update output tensor distribution
* delete unused variables
* fix transpose related codes
* update transpose load example
* exchange the iteration order
* fix 16x16 related dimension transpose
* fix a transpose index issue
* fix a transpose index issue
* fix clang format check
* update load tile transpose related codes
* fix compile errors and pass 16x16 tests
* fix a typo
* update logic
* check other data types
* add transpose load api
* update transpose load api
* fix clang format check
* change file name
* refactor codes
* update code name
* delete some unused codes
* delete the unused oob flag for transpose load
* update tensor view api for transpose load
* update for testing
* fix a typo error
* move transpose ops to example directory
* update transpose api
* update include file
* fix for pr review
* fix compile errors
* change directory name
* delete the duplicated directory
* update cmakelists file
* delete the unused codes
* update function names
* update transpose policy
* update code after remod.py
* update codes
* add some comment
* Polish the instr infrastructure
* build up the fixed instr
* redesign the transpose api, currently it has numerical error
* add the bf16 transpose
* fix some issues
* add some comments
* update document
* Finished the refactor of API and pass through the verification
* fix the merging issue
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com>
[ROCm/composable_kernel commit: a2f01141aa]
Batched Transpose
This folder contains example for transpose load for architecture gfx950. This transpose load has some constraints in input tile distribution.
build
# in the root of ck_tile
mkdir build && cd build
# you can replace <arch> with the appropriate architecture (for example gfx90a or gfx942) or leave it blank
sh ../script/cmake-ck-dev.sh ../ <arch>
# Make the transpose executable
make tile_example_transpose -j
This will result in an executable build/bin/tile_example_transpose
example
args:
-N input batch size (default:2)
-C input channel size. (default:64)
-H input height size. (default:1)
-W input width size. (default:64)
-v whether do CPU validation or not (default: 1)
-layout_in input tensor data layout - NCHW by default
-layout_out output tensor data layout - NHWC by default
-seed seed to be used, -1 means random every time (default:-1)
-k_name t to 1 will print kernel name (default:0)