[DOCS] Documentation Addition (Readme updates) (#2495)

* GH-2368 Adding a basic glossary

GH-2368 Minor edits

GH-2368 Adding missing READMEs and standardization.

resolving readme updates

GH-2368 Minor improvements to documentation.

Improving some readmes.

Further improvement for readmes.

Cleaned up the documentation in 'client_example' (#2468)

Update for PR

Update ACRONYMS.md to remove trivial terms

Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.

revise 37_transpose readme

revise 36_copy readme

Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.

Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.

Remove references to the Tile Engine in README files across multiple examples

* GH-2368 Adding a basic glossary

GH-2368 Minor edits

GH-2368 Adding missing READMEs and standardization.

resolving readme updates

GH-2368 Minor improvements to documentation.

Improving some readmes.

Further improvement for readmes.

Cleaned up the documentation in 'client_example' (#2468)

Update for PR

Update ACRONYMS.md to remove trivial terms

Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.

revise 37_transpose readme

revise 36_copy readme

Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.

Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.

Remove references to the Tile Engine in README files across multiple examples

Refine README files by removing outdated references to the Tile Engine

* Updates based on PR feedback 1

* Updates based on PR feedback 2

* Updates based on PR feedback 3

* Updates based on PR feedback 4

* Updates based on PR feedback 5

* Updates based on PR feedback 6

* Updates based on PR feedback 7

* Updates based on PR feedback 8

* Content Modification of CK Tile Example

* Modify the ck_tile gemm config

---------

Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
This commit is contained in:
Vidyasagar Ananthan
2025-10-16 03:10:57 -07:00
committed by GitHub
parent 013ba3c737
commit 92c67a824f
120 changed files with 8188 additions and 221 deletions

View File

@@ -1,13 +1,39 @@
# Copy Kernel
This folder contains basic setup code designed to provide a platform for novice
CK_Tile kernel developers to test basic functionality with minimal additional
code compared to the functional code. Sample functional code for a simple
tile distribution for DRAM window and LDS window are provided and data is moved
from DRAM to registers, registers to LDS, LDS to registers and finally data
is moved to output DRAM window for a simple copy operation.
# Copy Kernel with CK Tile
## build
```
This example demonstrates a basic copy kernel using the CK Tile programming model. It is designed as a minimal platform for new CK Tile kernel developers to test and understand tile-based data movement and memory hierarchy. Sample functional code for a simple
tile distribution for DRAM window and LDS window are provided and data is moved from DRAM to registers, registers to LDS, LDS to registers and finally data is moved to output DRAM window for a simple copy operation.
---
## Algorithm and Math
Given an input matrix $X$ of shape $[M, N]$, the copy kernel performs:
$$
Y_{i, j} = X_{i, j}
$$
- **Tilewise Copy**: Each thread block processes a tile (block) of the input, moving data from global memory (DRAM) to registers, registers to LDS (shared memory), LDS to registers, and finally to output DRAM.
---
## Tile Programming Model
- **Tiles**: Each thread block processes a tile of the input matrix.
- **Pipeline**: Simple, but can be extended for more complex memory patterns or fused operations.
---
## Features
- **Memory Hierarchy**: Illustrates DRAM, LDS, and register usage in CK Tile.
- **Minimal Example**: Ideal for learning and debugging tile-programming concepts.
- **Validation**: CPU validation and benchmarking options.
---
## Build & Run
```bash
# in the root of ck_tile
mkdir build && cd build
# you can replace <arch> with the appropriate architecture
@@ -18,8 +44,9 @@ make test_copy -j
```
This will result in an executable `build/bin/test_copy_kernel`
## example
```
### Arguments
```bash
args:
-m input matrix rows. (default 64)
-n input matrix cols. (default 8)
@@ -28,4 +55,25 @@ args:
-prec datatype precision to use. (default fp16)
-warmup no. of warmup iterations. (default 50)
-repeat no. of iterations for kernel execution time. (default 100)
```
```
---
## Source Structure
- **Kernel**: [`test_copy.hpp`](test_copy.hpp) (tile-programming kernel template)
- **Executable**: [`test_copy.cpp`](test_copy.cpp)
- **Build**: `CMakeLists.txt`
---
## Related CK Tile Examples
- [03_gemm](../03_gemm/README.md): GEMM with tiles
- [35_batched_transpose](../35_batched_transpose/README.md): Batched transpose with tiles
- [06_permute](../06_permute/README.md): Generic permutation with tiles
For distribution, see [`include/ck_tile/tile_program/tile_distribution/`](../../../include/ck_tile/tile_program/tile_distribution/).
---
[Back to CK Tile Examples](../README.md)