mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-11 17:00:18 +00:00
[rocm-libraries] ROCm/rocm-libraries#5383 (commit b660b8c)
[CK_TILE] Add CShuffleLds microbenchmark suite
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
## Summary
Microbenchmarks isolating LDS store/load operations in CShuffleEpilogue
for bank conflict analysis.
## Motivation
CShuffleEpilogue performs LDS store (MFMA registers → LDS) and load (LDS
→ registers for coalesced global writes). This suite isolates each
operation to:
- Identify which operation causes bank conflicts
- Measure pure LDS bandwidth per access pattern
- Validate access patterns across MFMA tile sizes and wave layouts
## Components
- **Microkernels** (`tile_load_store_microkernels.hpp`):
`StoreTile<Setup>`, `LoadTile<Setup>`
- **Setup Adapters** (`benchmark_cshuffle_lds.hpp`): Wire
CShuffleEpilogue to microkernels
- **Template** (`benchmark_template.cpp.in`): Generated benchmarks with
timing
## Build
```bash
cmake -G Ninja -B build -S . \
-DGPU_TARGETS=gfx950 \
-DBUILD_CK_EXAMPLES=ON \
-DBUILD_CK_TILE_CSHUFFLE_LDS_BENCHMARKS=ON
ninja -C build bench_lds_fp8_16x16x128_2x2_fp8
```
## New CMake Options
| Option | Default | Description |
|--------|---------|-------------|
| `BUILD_CK_TILE_CSHUFFLE_LDS_BENCHMARKS` | OFF | LDS microbenchmarks |
| `BUILD_CK_TILE_FMHA_TESTS` | ON | FMHA tests |
| `BUILD_CK_TILE_ENGINE` | ON | Tile engine |
| `BUILD_CK_TILE_ENGINE_TESTS` | ON | Tile engine tests |
| `BUILD_CK_EXAMPLES` | ON | Examples |
| `BUILD_CK_TUTORIALS` | ON | Tutorials |
| `BUILD_CK_DEVICE_INSTANCES` | ON | Device instances |
| `BUILD_CK_PROFILER` | ON | Profiler |
Setting guards to OFF reduces cmake configure from ~150s to ~5s.
This commit is contained in:
committed by
assistant-librarian[bot]
parent
5348b577ed
commit
7dcc606adc
45
include/ck_tile/utility/tile_load_store_microkernels.hpp
Normal file
45
include/ck_tile/utility/tile_load_store_microkernels.hpp
Normal file
@@ -0,0 +1,45 @@
|
||||
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
// SPDX-License-Identifier: MIT
|
||||
|
||||
/**
|
||||
* @file tile_load_store_microkernels.hpp
|
||||
* @brief Generic tile store/load microkernels.
|
||||
*
|
||||
* Setup::create() must return:
|
||||
* - For StoreTile: tuple<window, tile>
|
||||
* - For LoadTile: window
|
||||
*/
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "ck_tile/core.hpp"
|
||||
|
||||
namespace ck_tile {
|
||||
|
||||
template <typename Setup>
|
||||
struct StoreTile
|
||||
{
|
||||
static constexpr index_t kBlockSize = Setup::kBlockSize;
|
||||
|
||||
CK_TILE_DEVICE void operator()() const
|
||||
{
|
||||
auto [window, tile] = Setup::create();
|
||||
store_tile(window, tile);
|
||||
block_sync_lds();
|
||||
}
|
||||
};
|
||||
|
||||
template <typename Setup>
|
||||
struct LoadTile
|
||||
{
|
||||
static constexpr index_t kBlockSize = Setup::kBlockSize;
|
||||
|
||||
CK_TILE_DEVICE void operator()() const
|
||||
{
|
||||
auto window = Setup::create();
|
||||
[[maybe_unused]] volatile auto tile = load_tile(window);
|
||||
block_sync_lds();
|
||||
}
|
||||
};
|
||||
|
||||
} // namespace ck_tile
|
||||
Reference in New Issue
Block a user