Files
Vidyasagar Ananthan 92c67a824f [DOCS] Documentation Addition (Readme updates) (#2495)
* GH-2368 Adding a basic glossary

GH-2368 Minor edits

GH-2368 Adding missing READMEs and standardization.

resolving readme updates

GH-2368 Minor improvements to documentation.

Improving some readmes.

Further improvement for readmes.

Cleaned up the documentation in 'client_example' (#2468)

Update for PR

Update ACRONYMS.md to remove trivial terms

Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.

revise 37_transpose readme

revise 36_copy readme

Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.

Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.

Remove references to the Tile Engine in README files across multiple examples

* GH-2368 Adding a basic glossary

GH-2368 Minor edits

GH-2368 Adding missing READMEs and standardization.

resolving readme updates

GH-2368 Minor improvements to documentation.

Improving some readmes.

Further improvement for readmes.

Cleaned up the documentation in 'client_example' (#2468)

Update for PR

Update ACRONYMS.md to remove trivial terms

Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.

revise 37_transpose readme

revise 36_copy readme

Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.

Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.

Remove references to the Tile Engine in README files across multiple examples

Refine README files by removing outdated references to the Tile Engine

* Updates based on PR feedback 1

* Updates based on PR feedback 2

* Updates based on PR feedback 3

* Updates based on PR feedback 4

* Updates based on PR feedback 5

* Updates based on PR feedback 6

* Updates based on PR feedback 7

* Updates based on PR feedback 8

* Content Modification of CK Tile Example

* Modify the ck_tile gemm config

---------

Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
2025-10-16 03:10:57 -07:00

2.7 KiB

Copy Kernel with CK Tile

This example demonstrates a basic copy kernel using the CK Tile programming model. It is designed as a minimal platform for new CK Tile kernel developers to test and understand tile-based data movement and memory hierarchy. Sample functional code for a simple tile distribution for DRAM window and LDS window are provided and data is moved from DRAM to registers, registers to LDS, LDS to registers and finally data is moved to output DRAM window for a simple copy operation.


Algorithm and Math

Given an input matrix X of shape [M, N], the copy kernel performs:


Y_{i, j} = X_{i, j}
  • Tilewise Copy: Each thread block processes a tile (block) of the input, moving data from global memory (DRAM) to registers, registers to LDS (shared memory), LDS to registers, and finally to output DRAM.

Tile Programming Model

  • Tiles: Each thread block processes a tile of the input matrix.
  • Pipeline: Simple, but can be extended for more complex memory patterns or fused operations.

Features

  • Memory Hierarchy: Illustrates DRAM, LDS, and register usage in CK Tile.
  • Minimal Example: Ideal for learning and debugging tile-programming concepts.
  • Validation: CPU validation and benchmarking options.

Build & Run

# in the root of ck_tile
mkdir build && cd build
# you can replace <arch> with the appropriate architecture 
# (for example gfx90a or gfx942) or leave it blank
../script/cmake-ck-dev.sh  ../ <arch>
# Make the copy kernel executable
make test_copy -j

This will result in an executable build/bin/test_copy_kernel

Arguments

args:
          -m        input matrix rows. (default 64)
          -n        input matrix cols. (default 8)
          -id       warp to use for computation. (default 0)
          -v        validation flag to check device results. (default 1)
          -prec     datatype precision to use. (default fp16)
          -warmup   no. of warmup iterations. (default 50)
          -repeat   no. of iterations for kernel execution time. (default 100)

Source Structure


For distribution, see include/ck_tile/tile_program/tile_distribution/.


Back to CK Tile Examples