Files
composable_kernel/example/ck_tile/37_transpose
Yi DING f1d8ad2818 [CK_TILE] Use read_tr in universal gemm (#2436)
* Use read_tr in universal gemm

* Enable all instances back

* Revert example37 changes

* Resolve comments

* resolve comments 2

* Fix assertion msg

* fix the gemm basic

* change index_t to bool for preshuffle variable

* Solve the comment

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>
Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
2025-07-16 23:56:22 -07:00
..

Batched Transpose

This folder contains example for transpose load for architecture gfx950. This transpose load has some constraints in input tile distribution.

build

# in the root of ck_tile
mkdir build && cd build
# you can replace <arch> with the appropriate architecture (for example gfx90a or gfx942) or leave it blank
sh ../script/cmake-ck-dev.sh  ../ <arch>
# Make the transpose executable
make tile_example_transpose -j

This will result in an executable build/bin/tile_example_transpose

example

args:
          -N    input batch size (default:2)
          -C    input channel size. (default:64)
          -H    input height size. (default:1)
          -W    input width size. (default:64)
          -v    whether do CPU validation or not (default: 1)
  -layout_in    input tensor data layout - NCHW by default
 -layout_out    output tensor data layout - NHWC by default
       -seed    seed to be used, -1 means random every time (default:-1)
     -k_name    t to 1 will print kernel name (default:0)