Commit Graph

4050 Commits

Author SHA1 Message Date
assistant-librarian[bot]
d4a93660fb Merge commit '565fea26455b8e4f78ac57ed64d6bd12e701a9c9' into develop 2026-01-30 08:21:17 +00:00
Zoltán Lakatos
e7483043e6 fix undefined behaviour in softmax kernel (#3683)
Co-authored-by: root <zoltan.lakatos@streamhpc.com>

[ROCm/composable_kernel commit: 565fea2645]
2026-01-30 15:22:54 +08:00
assistant-librarian[bot]
0cda2801c4 Merge commit 'f3d8b7210fb99827bcb1d1bdaf9672b3ae8fb209' into develop 2026-01-30 04:42:40 +00:00
vivienfanghuagood
e38029e946 Extend CK fmha_batch_prefill kernel coverage to head_dim=256 (#3328)
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: f3d8b7210f]
2026-01-30 11:18:20 +08:00
assistant-librarian[bot]
a59a3944e8 Merge commit '6ff073784321a55ee276f38af195532d8d812670' into develop 2026-01-30 03:12:55 +00:00
MHYangAMD
24cf4cf9a8 Fix redundant cast in model sensitive rmsnorm (#3681)
* Fix redundant cast

* Fix linting

[ROCm/composable_kernel commit: 6ff0737843]
2026-01-30 10:52:19 +08:00
assistant-librarian[bot]
05846dfc99 Merge commit '83b61553548019eb9aa77a5efc72258a48dee42a' into develop 2026-01-30 02:36:23 +00:00
Max Podkorytov
e01e32ee52 Add ck-rocprof: GPU profiling tool for rocprof-compute (#3627)
* Decouple configure/build/test tools from Docker

Create a two-layer tool architecture:
- Core tools (ck-configure, ck-build, ck-test): Environment-agnostic,
  work on any system with ROCm - no Docker dependency
- Container tools (ck-docker): Manage Docker containers and delegate
  to core tools via docker exec

Changes:
- Add ck-configure: New CMake configuration tool with preset support,
  native GPU detection, and flexible options
- Refactor ck-build: Remove Docker dependency, add --configure and
  --list options, call ninja directly
- Refactor ck-test: Remove Docker dependency, add CTest integration
  with --smoke/--regression/--all options
- Enhance common.sh: Add native GPU detection, build directory utils,
  and output helpers
- Update ck-docker: Add configure/build/test/exec commands that
  delegate to core tools inside container

This enables:
- Native development on ROCm hosts without Docker
- Simpler CI/CD integration
- Consistent behavior inside and outside containers

Co-Authored-By: Claude <noreply@anthropic.com>

* Add ck-rocprof: GPU profiling tool for rocprof-compute

Adds a command-line profiling tool to simplify GPU performance
analysis workflow using AMD rocprof-compute.

Features:
- Easy setup with automatic Python venv configuration
- Simple CLI: setup, run, analyze, compare, list
- Automatic GPU architecture detection
- Focus on LDS metrics (Block 12) for bank conflict analysis
- Comprehensive documentation with examples and troubleshooting

Usage:
  ck-rocprof setup                    # One-time environment setup
  ck-rocprof run <name> <executable>  # Profile executable
  ck-rocprof analyze <name> [block]   # Analyze metrics
  ck-rocprof compare <name1> <name2>  # Compare two runs
  ck-rocprof list                     # List available runs

* Make ck-rocprof documentation concise and improve Docker integration

- Streamlined documentation from 416 to 157 lines (62% reduction)
- Focused on essential commands, metrics, and workflows
- Enhanced script to run all operations inside Docker containers
- Fixed workload directory path and improved container management
- Added automatic rocprofiler-compute installation and dependency handling

* Add --no-roof flag to ck-rocprof profile command

Skip roofline analysis by default to speed up profiling. Roofline
analysis can add significant time to profiling runs but is not
needed for most LDS bank conflict analysis workflows.

* Make ck-rocprof work independently of Docker

Add native execution mode that runs rocprof-compute directly on the host
system when available, falling back to Docker mode when not.

Key changes:
- Auto-detect native mode when rocprof-compute is in PATH or common locations
- Add execution mode wrappers (exec_cmd, file_exists, dir_exists, etc.)
- Native mode stores venv at .ck-rocprof-venv in project root
- Native mode stores workloads at build/workloads/
- Support user-installed rocprofiler-compute (e.g., ~/.local/rocprofiler-compute)
- Add CK_FORCE_DOCKER env var to force Docker mode
- Update help message to show current execution mode
- Maintain full backward compatibility with existing Docker workflow

Tested successfully with rocprofiler-compute 3.4.0 installed from source
on MI300X GPU in native mode.

Co-Authored-By: Claude <noreply@anthropic.com>

* Add clean/status commands and improve ck-rocprof robustness

- Add 'clean' command to remove profiling runs (supports --all)
- Add 'status' command to show configuration and environment info
- Add workload name validation to prevent path traversal attacks
- Fix uv installation to use pip instead of curl for reliability
- Add cross-platform stat support for macOS compatibility
- Consolidate ROCPROF_CANDIDATES to avoid code duplication
- Expand help documentation with all profiling block descriptions
- Fix Docker wrapper script escaping issues

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix analyze command to use correct workload path

rocprof-compute stores results directly in the workload directory
(pmc_perf.csv) rather than in a GPU architecture subdirectory.
Updated find_workload_path to detect this correctly.

Co-Authored-By: Claude <noreply@anthropic.com>

* Address PR review security and robustness issues

Security fixes:
- Escape executable path in cmd_run to prevent shell injection
- Add workload name validation to cmd_analyze and cmd_compare

Robustness improvements:
- Add error checking for uv package manager installation
- Use consistent project root detection (find_project_root || get_project_root)
- Use /opt/rocm instead of hardcoded /opt/rocm-7.0.1 in Docker mode
- Derive ROCM_REQUIREMENTS path from ROCPROF_BIN for flexibility
- Use gfx950 as fallback GPU consistent with common.sh

Documentation updates:
- Fix env var name GPU_TARGET -> CK_GPU_TARGET
- Update storage layout to reflect current structure (workloads/<name>/)
- Document clean and status commands
- Clarify native vs Docker default paths

Co-Authored-By: Claude <noreply@anthropic.com>

* Simplify ck-rocprof to native-only mode

Remove Docker mode from ck-rocprof. Docker users should run the tool
via `ck-docker exec ck-rocprof ...` instead.

This simplification:
- Removes ~210 lines of Docker-specific code
- Eliminates mode detection complexity
- Makes the script easier to maintain
- Provides clearer error messages when rocprof-compute is not found

The setup command now lists all searched locations when rocprof-compute
is not found, helping users understand how to install it.

Co-Authored-By: Claude <noreply@anthropic.com>

* Add rocprofiler-compute source installation fallback

When rocprof-compute is not found in system locations, automatically
install rocprofiler-compute 3.4.0 from source as a fallback. This
eliminates the hard dependency on system ROCm packages.

Implementation details:
- Clone rocprofiler-compute from GitHub to ~/.local/
- Install dependencies via requirements.txt (not editable install)
- Create wrapper that sets PYTHONPATH to source directory
- Execute source script directly rather than importing as module

This approach matches the project's development workflow and works
around the incomplete pyproject.toml that prevents editable installs.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>

[ROCm/composable_kernel commit: 83b6155354]
2026-01-29 17:20:22 -08:00
assistant-librarian[bot]
7945ef7d78 Merge commit '05ef93a69d8ccaf63f84b43b3dcb9b585f428051' into develop 2026-01-30 00:44:38 +00:00
Illia Silin
4883171ee4 Add a flag to build CK libs required for HipTensor. (#3684)
* create a filter to build only libs required by hiptensor

* allow building libs for miopen and hiptensor at the same time

* tweak the lib filtering logic one more time

[ROCm/composable_kernel commit: 05ef93a69d]
2026-01-29 16:12:49 -08:00
assistant-librarian[bot]
8a70a0d08a Merge commit 'f16d9100e42a978261f76319c66a7995e5f6d555' into develop 2026-01-29 18:34:46 +00:00
Enrico Degregori
a07d76a460 Multi AB support for wave transfer (#3578)
* Add multi AB support to wave transfer

* Improviments to multi ABD examples

* Add instances and use intrawave v1 instead of interwave

* Apply changes to other transfers

* Wave transfer: add support for multiple internal vgpr buffers

* Fix compilation error gfx11

[ROCm/composable_kernel commit: f16d9100e4]
2026-01-29 10:29:40 -08:00
Johannes Graner
1998be34bf [Conv] Enable bwd weight splitk autodeduction with cap (#3656)
* Enable bwd weight splitk autodeduction with cap

* Fix error threshold calculations

* Add missing logic to wmma multiple d kernel

* Fix threshold calculation

* Update test with new applicability

[ROCm/composable_kernel commit: fabac7e2c3]
2026-01-29 17:40:28 +00:00
assistant-librarian[bot]
84daa4d305 Merge commit 'e33f15709f8c1e05f5056edc7295276e121dc253' into develop 2026-01-29 15:20:56 +00:00
Robin Voetter
23453093e0 ck-builder: fix test related to changed xdl bwd cshuf v3 interface (#3677)
Force merging because I verified this fix manually:

git checkout develop
git pull
ninja smoke-builder (failed to build, as expected)
git checkout rvoetter/ckb-fix
ninja smoke-builder (passed!)

[ROCm/composable_kernel commit: e33f15709f]
2026-01-29 07:15:56 -08:00
assistant-librarian[bot]
961da131ed Merge commit '9b168082b7aa19bcf50fd9991baf10a0c77d105b' into develop 2026-01-29 04:42:46 +00:00
Khushbu Agarwal
68b475ad92 [CK_Tile] Adding support for preshuffleQuant in AB quant Block Scale Gemm (#3629)
* initial commit

* preshuffleQuant support for ABQuant

* fix mxfp4 to use correct QuantGroupSize

* addressing review comments and seperated Preshufflequant for A and B

* updated grouped gemm example for updated traits definition

* fix for CI failure

* updated grouped_gemm_abquant test for updated traits definition

* updated grouped_gemm_abquant test for updated traits definition

[ROCm/composable_kernel commit: 9b168082b7]
2026-01-28 19:45:09 -08:00
assistant-librarian[bot]
308d69498c Merge commit 'e3556fed0453e66cdebc5dad6b903f5e902cd9b4' into develop 2026-01-29 00:45:02 +00:00
Jeff Huang
29c56b8aae Optimize batch prefill kernel performance for VECTORIZED_LAYOUT KV cache (#3657)
- Add multi-dimensional page index support (YsGatherDims) in tile_scatter_gather
- Add is_gather_dim() and get_gather_index() for multi-dim page lookup
- Override MakeVDramTileDistribution() for VECTORIZED_LAYOUT to match
  GEMM's BWarpDstrEncoding (K decomposition: {K2, K0, K1})
- Add GetGemmKDecomposition() to retrieve kABKLane and kKPerThread
- Add static_assert for RowMajor VLayout requirement in batch prefill

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

[ROCm/composable_kernel commit: e3556fed04]
2026-01-29 07:18:41 +08:00
assistant-librarian[bot]
0c3ea9d826 Merge commit '83b58bb0c3ff12f426d45383900a6fd91b4116a1' into develop 2026-01-28 21:42:05 +00:00
Bartłomiej Kocot
c2892466a9 Grouped Conv Bwd Weight Direct Load (#3648)
* Grouped Conv Bwd Weight Direct Load

* Update gridwise_gemm_xdl_cshuffle_conv_v3.hpp

* Implement group merging for bwd_weight and add instances

* Link direct load instances

* builder fixes

* fix

* fixes

* fix

---------

Co-authored-by: Graner, Johannes <johannes.graner@amd.com>

[ROCm/composable_kernel commit: 83b58bb0c3]
2026-01-28 15:31:54 -06:00
ltqin
002e077401 Fix block scale init value (#3666)
* Make blockscale descale range adaptive to data type max value

* format

[ROCm/composable_kernel commit: 654bec3362]
2026-01-28 12:37:15 -08:00
assistant-librarian[bot]
dbadcf487a Merge commit '42048bdb7d8d931966af76c6dacfedce1c9da90a' into develop 2026-01-28 17:20:56 +00:00
Robin Voetter
97d6e59580 [CK_BUILDER] Integrate CKB validation with CK verification (#3649)
* ck-builder: tensor copy function

This function copies one tensor to another, so that the memory
layout can be changed between them.

* ck-builder: fix ck::bhalf literals

These types don't work properly.

* ck-builder: abstract compare_elements in gpu_verification.hpp and make builder use it

This reduces the amount of duplicated code a bit.

* ck-builder: add flat tensor iterator

This "iterator" type pretends to be a pointer, useful for passing
tensors to functions expecting pointer-like types.

* ck-builder: integrate validation with ck gpu verification

By templating the gpu_verify function over iterators, we can use
the new FlatTensorIterator to adapt the function to multi-
dimensional tensors without changing either implementation
too much.

* ck-builder: add check_by_accumulations

This changes the gpu_verification.hpp code to also accept "iterator"
types for the relevant gpu_verify and gpu_reduce_max functions.

* ck: fix test_gpu_verification GenerateRandomData for bhalf

is_integer_it<bhalf_t> yields true, but it is not actually
an integer.

* ck: make gpu_verification kernels be proper persistent kernels

Previously these were using a hardcoded value for the grid size. This
commit changes that so that the grid size is automatically derived
from the kernel's occupancy and the number of multiprocessors on
the GPU.

* ck: clean up gpu_verification.hpp using block_reduce

This implements a small generic block reduce function, and rewrites
the rest of gpu_verification.hpp using that function to clean it up
a bit.

* ck-builder: doc typos

* ck-builder: update testing readme with validation interface.

* ck-builder: rebase fixes + review comments

* ck-builder: fix device integer generation with float types

Passing bfloat here causes a nans due to type_convert performing
a bitcast.

* ck: another bhalf_t bug

CK expects that int-generation with ck::bhalf_t yields bhalf integers,
not unsigned integers. This makes the logic of FillUniformRandInteger
compatible with GeneratorTensor_2<InDataType>, however idiotic that
may be.

[ROCm/composable_kernel commit: 42048bdb7d]
2026-01-28 17:41:02 +01:00
kabrahamAMD
05f1759f0e [CK_BUILDER] Add reflection for wmma and bwd weight instances to ck builder reflection (#3592)
* added reflection for conv_fwd_multiple_d_wmma_cshuffle.hpp

* added reflection for device_grouped_conv_bwd_weight_xdl_cshuffle

* added reflection for device_grouped_conv_bwd_weight_xdl_cshuffle v3

* added reflection of max_transpose parameters

* fix printing of std optional parameters

* fix use of undefined ck::index

* added conv traits for device_grouped_conv_bwd_weight_multiple_d_xdl_cshuffle

* added xdl two stage instance to reflection

* added additional variables

* added reflection for grouped_conv_bwd_weight_multiple_d_wmma_cshuffle, _v3, grouped_conv_two_stage_wmma_cshuffle_v3,

* added reflection for device_grouped_conv_bwd_weigh_wmma_cshuffle_v3

* added reflection for bwd_weight_wmma_cshuffle

* added comments back in

* add printed output for optional parameters

* update README

* fix typo

* added num_gemm_k_prefetch_stage and small fixes

* modified test string due to reflection of new parameter

---------

Co-authored-by: Kevin Abraham <kevin.abraham@streamhpc.com>

[ROCm/composable_kernel commit: d6cccf6093]
2026-01-28 09:33:45 -07:00
assistant-librarian[bot]
78b36a13ab Merge commit 'bc6083bdd466d1e060253e7a49626c923293c483' into develop 2026-01-28 15:18:44 +00:00
Johannes Graner
28efdbd1c9 Update pytorch version in convolution dataset test generation (#3667)
* Update torch version in dataset test gen

[ROCm/composable_kernel commit: bc6083bdd4]
2026-01-28 07:38:10 -07:00
assistant-librarian[bot]
19a000f6c3 Merge commit '8e3d84aba3be5e851de5d6c6c3e9c08cadbce1da' into develop 2026-01-28 08:16:51 +00:00
Yi DING
bb0986e59e [CK_TILE] ABQuant New Preshuffle (#3638)
* Refactor

* Gemm quant improvement

* Change preshuffle

* Fix

* Fix grouped gemm ut

* Fix

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 8e3d84aba3]
2026-01-27 23:46:49 -08:00
assistant-librarian[bot]
d4b61e4db5 Merge commit '91e32f305fa4d809103431a81594c52240753d40' into develop 2026-01-27 22:14:22 +00:00
damien-lejeune
373d8dd63d [CK Tile] multi reduce improvements (#3607)
* WIP: refactoring

* Swap operation/data nested loops order

* Improve memory coalescing

* Add comments

* Enforce same identity element for the reduce operations

* Re-add compile time constant

* Comment + re-add __builtin_amdgcn_readfirstlane(0) to the loop init

---------

Co-authored-by: Damien Lejeune <damien.lejeune@amd.com>

[ROCm/composable_kernel commit: 91e32f305f]
2026-01-27 12:56:09 -08:00
linqunAMD
e9af74cb84 [ck] add gridwise base class for in all xdl kernel (#186) (#3544)
1. Add base class GridwiseGemm_xdl_cshuffle_base for all gridwise_gemm_xdl classes.
- to select correct LDS layout and epilogue behavior , three additional parameters is added.
- ForceNaiveLdsLayout: disable XOR based LDS layout when it is true
- DirectLoad: pipeline only use directload, we need force naive layout and ignore any padding on gfx9
- IsMxGemm: epilogue has two addtional dimensions
2. Move all LDS descriptor layout related fucntion to base class, including
- GetABlockDescriptor_AK0PerBlock_MPerBlock_AK1
- GetBBlockDescriptor_BK0PerBlock_NPerBlock_BK1
- GetCShuffleBlockDescriptor_MBlock_MPerBlock_NBlock_NPerBlock
3. Move several LDS related helper funtions to base class, including
- GetSharedMemoryNumberOfByte
- GetABlockDescriptor_AKB_AK0PerBlock_MPerBlock_AK1
- GetBBlockDescriptor_BKB_BK0PerBlock_NPerBlock_BK1
- GetCBlockDescriptor_MBlock_NXdlPerWave_MWaveMPerXdl_NBlock_NXdlPerWave_NWaveNPerXdl
4. Move all c epilogue related code to base class, and 4 kind of implementation are provided
- RunEpilogueNoShuffle
- RunEpilogue
- RunMultiDEpilogue
- RunMoeEpilogue

[ROCm/composable_kernel commit: 23cefda140]
2026-01-27 12:49:47 -08:00
assistant-librarian[bot]
362767eba8 Merge commit 'b737f1dee5a097f8b62156335e21259d8dd2784c' into develop 2026-01-27 19:18:39 +00:00
Michał Kulikowski
8130aa058e [CK]Refactoring threadwise_tensor_slice_transfer_v3r1.hpp (#3263)
Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: b737f1dee5]
2026-01-27 10:48:16 -08:00
assistant-librarian[bot]
417ad9c7f1 Merge commit 'b26cb596b0cbea9f40ae36b3f245b5aa7120c5c9' into develop 2026-01-27 18:19:38 +00:00
Illia Silin
71ac48d63a fix some syntax errors (#3658)
[ROCm/composable_kernel commit: b26cb596b0]
2026-01-27 09:59:39 -08:00
assistant-librarian[bot]
21657a1f32 Merge commit '0cc83cb8e8c9d9d926469f862bc1272ef0cf0dc8' into develop 2026-01-27 16:17:03 +00:00
spolifroni-amd
cf363cb35d CK: removed the api reference (#3571)
* removed the api reference

* updating to the latest rocm-docs-core min version

* fixed a formatting issue with buffer views

* removed reference links from code snippets

* removed reference links from code snippets

---------

Co-authored-by: John Afaganis <john.afaganis@amd.com>

[ROCm/composable_kernel commit: 0cc83cb8e8]
2026-01-27 07:36:47 -08:00
assistant-librarian[bot]
fea562ee53 Merge commit 'b66597ed96180ce21e7e6a6678dfc232ed07c800' into develop 2026-01-27 14:20:24 +00:00
Max Podkorytov
078912ec20 Add build time optimization documentation (#3608)
This document describes techniques for reducing C++ template instantiation
overhead in the Composable Kernel codebase, including:

- Replacing recursive templates with pack expansion (O(N) → O(1) depth)
- Using named functors instead of lambdas to share instantiations
- Replacing template recursion with constexpr loops
- Using fold expressions for accumulation operations

These techniques can significantly reduce build times for template-heavy code.

[ROCm/composable_kernel commit: b66597ed96]
2026-01-27 06:07:27 -07:00
assistant-librarian[bot]
d7af03f452 Merge commit '3d67e6c4927a9daea9076fab75b23fb44fdc22b1' into develop 2026-01-27 09:19:15 +00:00
Bartłomiej Kocot
ab6bbbfee1 [CK TILE] Enable CK TILE Conv Fwd tests in CI and fix check_err (#3624)
* [CK TILE] Enable CK TILE Conv Fwd tests in CI and fix check_err

* Update test_grouped_convnd_fwd_tile.cpp

* Update test_grouped_convnd_fwd_tile.cpp

* Update conv_tuning_params.hpp

* clang format fix

* Update CMakeLists.txt

[ROCm/composable_kernel commit: 3d67e6c492]
2026-01-27 11:04:11 +02:00
Johannes Graner
eb72f85509 [CK tests] Extend conv GPU reference (#3539)
* test_convnd_fwd

* test_convnd_bwd_data

* test_conv_bwd_data_scale

* test_grouped_convnd_fwd_clamp

* test_grouped_convnd_fwd_scale

* multiple A/B tensors and D tensor for fwd GPU ref

* test_grouped_convnd_fwd_scaleadd_ab

* test_grouped_convnd_fwd_bias_clamp

* test_grouped_convnd_fwd_bilinear

* test_grouped_convnd_fwd_gk_bias_clamp

* Extend GPU reference to enable batchnorm epilogue

* test_grouped_convnd_fwd{,_gk}_bias_bnorm_clamp

* test_grouped_conv_bwd_data_bilinear

* test_grouped_convnd_bwd_weight_bilinear

* Add missing template instantiation

* Perform operations in float in reference

* Slightly increase tolerance for batchnorm profiler

* Revert "Slightly increase tolerance for batchnorm profiler"

This reverts commit a3b2475229.

* Revert "test_grouped_convnd_fwd{,_gk}_bias_bnorm_clamp"

This reverts commit 6da4576060.

* Revert "Extend GPU reference to enable batchnorm epilogue"

This reverts commit e2f75fa10e.

* Clarify variable names

* Refactor elementwise ops into helper functions

* Make helpers C++17-compatible

[ROCm/composable_kernel commit: c190d8d61f]
2026-01-27 09:49:42 +01:00
assistant-librarian[bot]
aa3b7866b0 Merge commit 'cc75948d1c7f732d102c8e31dc007a2ccd07761f' into develop 2026-01-27 01:42:04 +00:00
Robin Voetter
a7b7eae2a1 [CK_BUILDER] conv bwd weight testing (#3618)
* ck-builder: restructure testing conv

In order to prepare for bwd of conv testing, this commit moves some
files and types around so that we can reuse ckt::Args for both forward
and backwards convolution.

* ck-builder: decouple fwd_ck.hpp and fwd_reference.hpp from fwd.hpp

This will allow us to more easily include fwd.hpp from backwards
definitions, which is required for initializing bwd values.

* ck-builder: fix layout of test_ckb_conv_bwd_weight_xdl_cshuffle_v3

Turns out that the supplied layout isn't actually supported...

* ck-builder: ck and reference conv integration for bwd weight

* ck-builder: ck bwd weight execution test

* ck-builder: ckt::run support for ck-tile bwd weight

* ck-builder: ck tile bwd weight execution test

* ck-builder: extra debug printing in MatchesReference

* ck-builder: make ckt::run return RunResult

This type is more convenient than std::tuple, as it will allow us to
use google test matchers with this in the future.

* ck-builder: RunResult matcher

Using EXPECT_THAT(..., SuccessfulRun()) will generate a check and a nice error
message about how and why running an algorithm failed.

* ck-builder: doc fixes

* ck-builder: add missing headers

[ROCm/composable_kernel commit: cc75948d1c]
2026-01-26 23:50:15 +01:00
assistant-librarian[bot]
65be39bfd1 Merge commit '8654c0628f83261d3dd64cfb4ec80e9dd2b29fa5' into develop 2026-01-26 22:14:16 +00:00
Andrew Clark
daa6cae5f5 Finished testing failure types. Removed testing code.
[ROCm/composable_kernel commit: 8654c0628f]
2026-01-26 15:09:49 -07:00
Andrew Clark
858387034e Removed working tests. Validating remaining tests.
[ROCm/composable_kernel commit: 402f21d0a6]
2026-01-26 15:09:49 -07:00
Andrew Clark
b555c06b8d Removed working tests. Validating remaining tests.
[ROCm/composable_kernel commit: 1397924c21]
2026-01-26 15:09:49 -07:00
Andrew Clark
8136cf5c72 Testing a pattern to support all text variations
[ROCm/composable_kernel commit: 6c596b9553]
2026-01-26 15:09:49 -07:00