Max Podkorytov
7dcc606adc
[rocm-libraries] ROCm/rocm-libraries#5383 (commit b660b8c)
[CK_TILE] Add CShuffleLds microbenchmark suite
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
## Summary
Microbenchmarks isolating LDS store/load operations in CShuffleEpilogue
for bank conflict analysis.
## Motivation
CShuffleEpilogue performs LDS store (MFMA registers → LDS) and load (LDS
→ registers for coalesced global writes). This suite isolates each
operation to:
- Identify which operation causes bank conflicts
- Measure pure LDS bandwidth per access pattern
- Validate access patterns across MFMA tile sizes and wave layouts
## Components
- **Microkernels** (`tile_load_store_microkernels.hpp`):
`StoreTile<Setup>`, `LoadTile<Setup>`
- **Setup Adapters** (`benchmark_cshuffle_lds.hpp`): Wire
CShuffleEpilogue to microkernels
- **Template** (`benchmark_template.cpp.in`): Generated benchmarks with
timing
## Build
```bash
cmake -G Ninja -B build -S . \
-DGPU_TARGETS=gfx950 \
-DBUILD_CK_EXAMPLES=ON \
-DBUILD_CK_TILE_CSHUFFLE_LDS_BENCHMARKS=ON
ninja -C build bench_lds_fp8_16x16x128_2x2_fp8
```
## New CMake Options
| Option | Default | Description |
|--------|---------|-------------|
| `BUILD_CK_TILE_CSHUFFLE_LDS_BENCHMARKS` | OFF | LDS microbenchmarks |
| `BUILD_CK_TILE_FMHA_TESTS` | ON | FMHA tests |
| `BUILD_CK_TILE_ENGINE` | ON | Tile engine |
| `BUILD_CK_TILE_ENGINE_TESTS` | ON | Tile engine tests |
| `BUILD_CK_EXAMPLES` | ON | Examples |
| `BUILD_CK_TUTORIALS` | ON | Tutorials |
| `BUILD_CK_DEVICE_INSTANCES` | ON | Device instances |
| `BUILD_CK_PROFILER` | ON | Profiler |
Setting guards to OFF reduces cmake configure from ~150s to ~5s.
2026-04-15 03:44:07 +00:00
..
2026-01-17 08:30:27 +01:00
2026-02-25 23:23:02 +00:00
2025-11-28 13:49:54 -08:00
2026-02-25 23:23:02 +00:00
2026-01-07 16:30:57 +01:00
2026-01-20 13:06:59 -08:00
2026-03-30 14:20:20 +00:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2026-04-10 15:18:02 +00:00
2025-11-28 13:49:54 -08:00
2026-04-15 03:44:07 +00:00
2025-12-15 13:38:25 +01:00
2026-01-20 09:39:57 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2026-01-27 09:49:42 +01:00
2026-01-27 09:49:42 +01:00
2025-11-28 13:49:54 -08:00
2026-01-28 17:41:02 +01:00
2025-11-28 13:49:54 -08:00
2026-01-07 16:30:57 +01:00
2025-12-11 09:06:20 +01:00
2026-02-25 23:23:02 +00:00
2026-02-25 23:23:02 +00:00
2026-01-07 10:27:16 -08:00
2026-02-07 00:09:58 +00:00
2025-11-28 13:49:54 -08:00
2026-02-25 23:23:02 +00:00
2025-12-03 07:38:23 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2026-02-25 23:23:02 +00:00
2026-01-15 07:19:31 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2026-01-27 09:49:42 +01:00
2026-01-28 17:41:02 +01:00
2026-03-27 03:58:37 +00:00
2026-02-25 20:11:01 +00:00
2026-02-11 13:43:01 +00:00
2026-03-27 03:58:37 +00:00
2026-03-30 14:20:20 +00:00
2026-01-13 07:14:23 +01:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2026-02-02 09:39:48 -08:00
2025-11-28 13:49:54 -08:00
2025-12-16 19:50:49 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2026-03-06 16:27:59 +00:00
2025-11-28 13:49:54 -08:00
2026-02-28 20:11:11 +00:00
2025-11-28 13:49:54 -08:00
2026-01-07 16:30:57 +01:00
2026-03-11 10:00:52 +00:00