Commit Graph

12 Commits

Author SHA1 Message Date
John Shumway
5122637215 [CK_BUILDER] Convert convolution traits to a struct with factory functions (#3547)
* Factor helpers out of conv_traits.hpp

* Create a non-templated conv_traits struct

* Migrate to new instance-specific instance_to_conv_traits functions

* Clean up reflection concepts

* Clean up ConvTraits helpers

* Update testing for convolution traits

This is a lot of cleanup on tests to have verbose coverage of feature
extraction, explicit tests for each supported device kernel, and
simple, readable test code.

* Address reviewer comments and resolve merge conflict
2026-01-15 10:03:21 +01:00
Ville Pietilä
9908a87c31 [CK_BUILDER] Add bwd weight factories (#3509)
* Add placeholder test.

* Initial conv bwd weight factory.

* Conv builder test refactoring.

* Add missing pieces to bwd weight factory.

* Improve compile time erros message when no matching factory is found.

* Use amcro to ensure automatic macthing between concepts are their string representations.

* Improve compile time diagnostics.

* Small improvements.

* Improve missing member/wrong type compile-time errors.

* Improve compile time diagnostics.

* Concept bug fixes.

* Remove debug assert.

* Update algorithm signature diagnostics.

* Factory bug fixes.

* First functional version of bwd weight conv factory.

* Refactor handing of GEMM-K batch template parameter in conv bwd weight factory.

* Concept improvements.

* Improve concept diagnostics.

* Introduve a common size type for concepts.

* Update compiletime diagnostics to use the size type.

* Update conv specialization enum.

* Fix fwd conv builder tests.

* Fix smoke tests.

* Separate bwd weigth and bwd data tests into separate targets.

* Clean-up CK Tile builder tests.

* Add bwd weight XDL CShuffle V3 factory.

* Build conv bwd weigth v3 instances successfully.

* Add instance traits for DeviceGroupedConvBwdWeight_Xdl_CShuffleV3.

* Test fix.

* Add instance traits for bwd weight algorithms.

* Add unit tests for instance strings.

* Build new instance traits unit tests but exclude WMMA for now.

* Added factory for DeviceGroupedConvBwdWeightTwoStage_Xdl_CShuffle.

* Conv bwd weight DL factory.

* Final implementation for bwd weight DL factory.

* Add test for creating DeviceGroupedConvBwdWeightMultipleD_Xdl_CShuffle instance.

* Add factory for DeviceGroupedConvBwdWeightMultipleD_Xdl_CShuffle

* Treat ref algorithm the same way as real algorithms in the dispatcher.

* Refactor large tensor support and WMMA configuration.

* Add factory and tests for DeviceGroupedConvBwdWeight_Wmma_CShuffleV3.

* Update Readme.

* Fix WMMA bwd weight tests.

* Added factory and tests for DeviceGroupedConvBwdWeightTwoStage_Wmma_CShuffleV3.

* Factory and tests for DeviceGroupedConvBwdWeight_Wmma_CShuffle.

* Dispatching for DeviceGroupedConvBwdWeightMultipleD_Wmma_CShuffle.

* Add factory for DeviceGroupedConvBwdWeightMultipleD_Wmma_CShuffleV3

* Fix DeviceGroupedConvBwdWeightMultipleD_Wmma_CShuffleV3 factory and  compute types for input and output tensor in bwd weigth convs.

* Fix fwd factories after refactoring.

* clang-format

* Move compile-time diagnostics to a separate branch.

* Fix ref algorithm dispatching.

* Fix smoke tests.

* clang-format

* Fix factory for regular WMMA conv bwd weight.

* Clarify builder Readme.

* Remove obsolete test file.

* Fix test after merge.

* clang-format

* Remove the C++26 extensions.

* Unify conv elementwise ops and layout definitions for fwd and bwd directions.

* Remove old layout and elementwise ops.

* Unify handling of conv tensor types between fwd and bwd directions.

* Unify block transfer for fwd and bwd directions. Rename ThreadSliceDim to ThreadClusterRank.

* Make BlockTransferDescriptor concept parametrized. Introduce a common TileTransferParameters concept for conv algorithms.

* clang-format

---------

Co-authored-by: Ville Pietilä <>
2026-01-13 18:12:38 +02:00
Robin Voetter
e3884bbf05 [CK_BUILDER] Debug utilities (#3528)
* ck-builder: make toString to_string

We are using snake case for CK-Builder

* ck-builder: add debug.hpp with tensor descriptor printing function

This adds some initial functionality to debug.hpp, a header which will
be used to house some debug utilities.

* ck-builder: abstract nd-iteration

Abstracting this makes it easier to test, clearer, and allows us to
use it elsewhere (such as in debug.hpp soon)

* ck-builder: tensor printing

* ck-builder: rename INT32 to I32

This makes it more in line with the other data type definitions.
2026-01-08 10:14:13 +01:00
John Shumway
a7d6b1e700 Add unit test coverage for conversion to convolution traits (#3515)
Our concept-base conversions are fragile and too complex. We want to refactor to straightforward functions
for each intance trace class template. This change adds unit test coverage to make that refactoring safer.
2026-01-07 07:44:21 -08:00
Robin Voetter
1c433c64ec [CK_BUILDER] Integrate reference conv with testing (#3511)
* ck-builder: explicitly delete forward declarations

Before, these functions were seen as a forward declaration for an existing function.
If no actual implementation overload could be found, these would be selected and
a linker error or warning would be generated. By marking these functions as explicitly
deleted, they incorrect invocations are generated as compile error instead.

* ck-builder: ckt::run plumbing for reference conv

This implements the ckt::run plumbing for the reference convolution
implementation and sets up the first complete end-to-end test.

* ck-builder: make validation system check for all-zeros

When both the actual and reference output are both all zero bits,
there is probably something wrong in the test framework.

* ck-builder: proper implementation+tests for TensorDescriptor::is_packed

* ck-builder: fix typos
2026-01-06 09:29:06 +01:00
Robin Voetter
e6e7dc2910 [CK_BUILDER] validation (#3471)
This pull request builds on #3267 by proving the "validation" infrastructure, the means to compare a set of `Outputs`.

The design of the validation infrastructure is relatively straight forward:
- Each SIGNATURE should come with a `validate()` implementation, which should be implemented in a similar way that the other functions/types from `testing.hpp` are implemented.
- `validate()` returns a `ValidationReport`, which is a structure that keeps all relevant information about comparing the tensors from two `Outputs`. Note that crucially, `validate()` should not do any reporting by itself. Rather, glue logic should be implemented by the user to turn `ValidationReport` into a relevant error message.
- You can see this clue code for CK-Builder itself in `testing_utils.hpp`, its `MatchesReference()`. This functionality is relatively barebones right now, it will be expanded upon in a different PR to keep the scope of this one down.

The comparison is done on the GPU (using an atomic for now), to keep tests relatively quick. Some notable items from this PR:
- To help compare the tensors and with writing tests, I've written a generic function `tensor_foreach` which invokes a callback on every element of a tensor.
- For that it was useful that the `TensorDescriptor` has a rank which is known at compile-time, so I've changed the implementation of `TensorDescriptor` for that. I felt like it was a better approach than keeping it dynamic, for multiple reasons:
  - This is C++ and we should use static typing where possible and useful. This way, we don't have to implement runtime assertions about the tensor rank.
  - We know already know the rank of tensors statically, as it can be derived from the SIGNATURE.
  - It simpifies the implementation of `tensor_foreach` and other comparison code.
- There are a lot of new tests for validating the validation implementation, validating validation validation tests (Only 3 recursive levels though...). For a few of those functions, I felt like it would be useful to expose them to the user.
- Doc comments everywhere.
2026-01-05 04:57:34 -08:00
kabrahamAMD
f86bbb1aef [CK_Builder] [testing] Integrate device random generators (#3427)
Implemented device random number generators for ck tensors.
Includes tests and integration to ck builder testing interface.
2025-12-30 10:03:05 -08:00
Bartłomiej Kocot
cbc8335964 Improve XDL to WMMA porting for grouped conv fwd (#3456)
Refactors the way the number of XDL (matrix multiply-accumulate) instructions per wave is calculated and used in the grouped convolution forward implementations, especially to better support WMMA (Wave Matrix Multiply-Accumulate) instructions and 16x16 tiles. 
The changes use MXdlPerWave instead of NXdlPerWave to increase number of waves per M dim.
2025-12-19 15:58:51 -07:00
John Shumway
9ac51aa0f4 Add describe() method to device ops for runtime introspection (#3375)
Introduces a polymorphic describe() method to BaseOperator that enables runtime introspection of kernel configurations through a unified interface.

Key changes:

* Add virtual describe() method to BaseOperator returning Description objects
* Implement describe() in 6 device operation classes (conv fwd/bwd variants)
* Create conv_describe.hpp with factory function for ConvDescription
* Extract type definitions to conv_types.hpp to resolve circular dependencies
* Add InstanceStringDescription for kernels without full ConvDescription support

Other Improvements:

* Update tests to use describe() instead of GetInstanceString()
* Remove circular dependency include from conv_traits.hpp
* Add ODD_C to ConvFwdSpecialization enum and fix OddC mapping
* Replace silent fallback in conv_layout() with compile-time error

This provides a foundation for runtime kernel introspection and better tooling support for analyzing and debugging kernel configurations.
2025-12-14 12:49:12 -08:00
Robin Voetter
6219b12730 [CK_BUILDER] convolution testing (#3267)
* Add README.md for testing

* Add tensor_memory_manager.

* ck-builder: tensor memory manager rebase fixes

This fixes some issues caused by the API being changed recently.
Also, this streamlines the ckt namespace to always be ck_tile::builder::test,
as this is already being used by other tests

Really, this commit should be squashed into the previous,
but I'm keeping it separate for brevity.

* ck-builder: test arguments initial prototype

* ck-builder: test system initial prototype

* ck-builder: fix non-standardized copyright comments

* ck-builder: new prototype

* ck-builder: group testing inputs/outputs into a separate structure

This is basically the return of the tensor memory manager after all,
except that the design is more closely tied to the actual operation.
Using a struct allows us to add additional input/output tensors
without breaking code (by defaulting those new parameters). Note
that the tensors are split into a separate inputs/outputs because we
usually want to allocate the output _twice_: once for the real
computation and once for the reference computation.

* ck-builder: simplify prototype naming; start docs

* ck-builder: update testing readme

* ck-builder: testing documentation

* ck-builder: HipStatusMatcher

This matcher can be used to check HIP status codes and provide
nice and readable error messages.

* ck-builder: tensor_buffer.hpp tests

* ck-builder: conv_fwd.hpp tests

* ck-builder: add example end-to-end test in conv fwd 2d fp16

* ck-builder: simplify extent usage

* ck-builder: update testing doc

* ck-builder: skip end to end test on non-gfx9

* fix check_copyright_year interpreter

/bin/bash is not guaranteed to exist on Linux. Signed,
a NixOS user

* ck-builder: fix copyrights

* ck-builder: reduce conv fwd testing size

This test allocated 24GB of memory, too much for 16GB cards.

---------

Co-authored-by: John Shumway <jshumway@amd.com>
2025-12-13 15:33:41 +01:00
Ville Pietilä
d66e5f667c [CK_BUILDER] Improve CK Builder and CK Builder tests (#3382)
* Remove stale documentation.

* Add placeholder for conv algorithm design description. Add link to conv factory description.

* Improve testing transfer parameters.

* Python script to check the block tilings.

* Improve tests and conv types serialization.

* Change representation of boolean values from 1/0 to true/false in instance strings.

* Change representation of boolean values from 1/0 to true/false in conv algorithm types.

* Test code improvements.

* Improve covn descriptions tests.

* Improve conv signature definition in conv fwd builder tests.

* clang-format.

* Remove obsolete script.

* Revert StaticAssertTypeEq changes in conv layout tests.

* Remove obsolete using declaration.

---------

Co-authored-by: Ville Pietilä <>
2025-12-11 09:50:00 +02:00
Bartłomiej Kocot
04612c30ce [CK_BUILDER] Ck Tile Grouped convolution factory (#3352)
* [BUILDER] Ck Tile Grouped convolution factory

* Part 2

* Fixes after rebase

* Remove leftovers
2025-12-08 10:32:56 +01:00