mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-02 20:51:23 +00:00
CK: removed the api reference (#3571)
* removed the api reference * updating to the latest rocm-docs-core min version * fixed a formatting issue with buffer views * removed reference links from code snippets * removed reference links from code snippets --------- Co-authored-by: John Afaganis <john.afaganis@amd.com>
This commit is contained in:
@@ -17,9 +17,9 @@ Each thread in a workgroup owns a portion of the overall tensor data, stored in
|
||||
|
||||
This design enables three critical optimizations:
|
||||
|
||||
* It maximizes register utilization by keeping frequently accessed data in the fastest memory hierarchy.
|
||||
* It eliminates redundant memory accesses since each thread maintains its own working set.
|
||||
* It provides a clean abstraction for complex algorithms like matrix multiplication where each thread accumulates partial results that eventually combine into the final output.
|
||||
* It maximizes register utilization by keeping frequently accessed data in the fastest memory hierarchy.
|
||||
* It eliminates redundant memory accesses since each thread maintains its own working set.
|
||||
* It provides a clean abstraction for complex algorithms like matrix multiplication where each thread accumulates partial results that eventually combine into the final output.
|
||||
|
||||
Thread-Local Storage Model
|
||||
==========================
|
||||
@@ -384,8 +384,7 @@ Static distributed tensors integrate seamlessly with other CK Tile components:
|
||||
// Main GEMM loop
|
||||
for(index_t k_tile = 0; k_tile < K; k_tile += kTileK) {
|
||||
// Create tile windows for this iteration
|
||||
// See :ref:`ck_tile_tile_window` for details
|
||||
auto a_window = make_tile_window(
|
||||
auto a_window = make_tile_window(
|
||||
a_ptr, ALayout{M, K},
|
||||
ATileDist{},
|
||||
{blockIdx.y * kTileM, k_tile}
|
||||
@@ -398,7 +397,6 @@ Static distributed tensors integrate seamlessly with other CK Tile components:
|
||||
);
|
||||
|
||||
// Load tiles to distributed tensors
|
||||
// See :ref:`ck_tile_load_store_traits` for optimized loading
|
||||
auto a_tile = a_window.load();
|
||||
auto b_tile = b_window.load();
|
||||
|
||||
|
||||
Reference in New Issue
Block a user