* Update copyright messages.
Copyright messages should no longer include a year. This PR updates all 38 source files to the new format.
* Switch to (C) from unicode copyright symbol.
The unicodein comments was causing compilation errors.
* Add backward weight instance traits for xdl cshuffle.
To keep instance test file sizes reasonable, we start a new test_bwd_weight_instances_traits.cpp test file.
* Fix copyright notices.
* Remove (c) symbol, replace with (C).
Having UTF-8 in source caused an error with code generation.
* initial commit for testing datatypes, layouts and traits
* correct warp tile size for small datatype config to make a validate instance for fp16, bf16, fp8
* add tile size coverage test
* Cover more tests, parallel instance generation, documentation
* update cmakelist to run more tests
* initial codes to support add test params in json file
* add congurable problem sizes for different tests
* modify README.md
* clean test_gemm_simple code
* correct padding coverage test
* Add comprehensive and quick tile size config files
* remove fp64 from datatypes
* update documents. manage selecting tile_size config (quick or Comprehensive)
* correct padding test problem sizes
* update comprehensive test and correct documents
* Skip GEMM tests with unsupported arguments instead of failing
* change gen_single instead of gen_indivisual because of an issue. add splitk tests to tile_size_quick_config
* clean CMakeList, remod py file
* Refactor test configs: Rename tile_size to coverage, remove separate traits config, clean cmakefile, readme
* update fp32, fp8 to test all layouts, clean documents and comments
* limit fp32 test layouts to rcr because of compilation error on some gpus
* remove fp32 because of the removing from gemm_instance_builder, make quick test smaller, updating comments
* Fix fp8/bf8 test failures on gfx950 by adding OCP FP8 format support
* Reduce quick_coverage test count from ~250 to ~144 for faster CI
* Add device operation to conv signature. Use unions to hold conv layouts and device operations.
* Add predicates for all device op instances.
* Use the device op signature for validation.
* Fix ckb CMakeLists.txt file for tests.
* Fix building CK Builder instance traits after the introduction of direct load template parameter in CK.
* Fix clang-formatting.
* Add factory for DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle device op.
* Add conv factory for DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
* Rename elements per wave per shuffle member in the epilogue concept.
* clang-format
* Add concepts and types for optional device op template parameters.
* Add optional compute, direct load, and loop scheduler arguments to conv factory.
* Add number of groups to merge template parameter.
* clang-format.
* Add InstanceTraits for DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
* Add InstanceTraits for kernel_grouped_conv_fwd_dl_multiple_d
* A few small changes to fix broken instance traits.
Generalize the current convolution factory in CK Builder to be able to build instances of any relevant convolution device operation. The main changes are:
* Added new enums FwdGroupConvDeviceOperation, BwdDataGroupConvDeviceOperation, and * BwdWeightGroupConvDeviceOperation that contain the device operations for which the builder should be able to build instances.
* Create a union structure GroupConvDeviceOp that can represent a single value of the fwd, bwd weight, or bwd data device operations. This would be more naturally represented by std::variant object, but we cannot use std::variant in NTTPs because it is not a structural object.
* Introduced a new member device_operation in the ConvSignatureDescriptor concept that assumes GroupConvDeviceOp value.
* Added predicates to be used in creation ConvFactory specialization for the different device operation. When we add support for a new device operation, we'll just create a new ConvFactory specialization with appropriate predicates.
* Changed handling of the convolution layouts (GroupConvLayout1D, GroupConvLayout2D, GroupConvLayout3D) to use the union based handling, i.e., there's now a GroupConvLayout union struct that can hold a single value of the 1D, 2D, or 3D layouts. This simplifies the handling of the different layouts as we get rid of templatized convolution signature.
These code changes allow developers to work more easily in parallel when adding new device operations.
* Fix building CK Builder instance traits after the introduction of direct load template parameter in CK.
* Fix clang-formatting.
* Refactor instance_traits_util and add unit tests tests
* Address reviewer comments.
Just adds some TODOs to indicate deprecated layouts in our reflection. Our strategy is to leave the reflection code broad (covering deprecated features), but keep the builder concepts narrow. Once we've removed deprecated features from all instances, we can remove them from reflection.
Also add a comment to the cmake to explain the unit test target test_conv_builder.
* Addressed more reviewer comments.
* Remove duplicate PassThrough::name
Accidentally added this field to the end of the struct, too. The `name` field should be a the start of the struct for consistency.
We need to check all the architectures for build errors. This missing tf32 type came up as a build failure when I compiled for different instinct architectures.
* Add name member to unary elementwise ops.
* Update elementwise_op_name to check for name attribute.
* Require that the layout is derived from BaseTensorLayout struct.
* Disable c++20-compat warnings when building old CK in C++20 mode
Turns out that this creates some warnings for no good reason.
* ck-builder: add missing layouts and element-wise op names
For layouts, we can directly use the ::name attribute, which should
cover all layouts. For element-wise ops, I just added the ones which
are currently missing when compiling CK with -DMIOPEN_REQ_LIBS_ONLY.
* [CK_BILDER] Add compile-time reflection for a convolution instance
Introduce InstanceTraits template metaprogramming framework to enable runtime introspection of device kernel template parameters without requiring implementation knowledge. This reflection system extracts configuration details (block sizes, data types, layouts, tuning parameters) directly from kernel specializations through template
pattern matching. In particular, the GetInstanceString method returns a string that uniquely idenitfies the kernel, by explicitly serializing all template paramter values.
This provides critical functionality for MIOpen integration, since the existing GetTypeString method is ambiguous, and only captures some of the template paramters.
The implementation uses a two-level design: a primary InstanceTraits template declaration in instance_traits.hpp serves as the interface, while kernel-specific specializations (e.g., for DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3) provide the actual extraction logic. This separation allows the reflection system to scale to additional kernel types without modifying the core interface.
Key architectural decisions:
- Forward-declare device kernels in instance_traits.hpp to avoid circular dependencies, since device implementation headers will include the reflection headers
- Use compile-time constants and type aliases to expose kernel parameters, enabling zero-overhead introspection
- Provide a templated instance_string() function that generates human-readable kernel configuration strings by serializing all template parameters in order, useful for debugging and kernel identification
- Guard reflection integration with preprocessor definition CK_EXPERIMENTAL_BUILDER to keep it opt-in until the API stabilizes
- Add GetInstanceString() virtual method to BaseOperator, allowing runtime polymorphic access to compile-time kernel information
This infrastructure also enables upcoming higher-level semantic reflection abstractions (like ConvTraits) to query kernel configurations programmatically.
Includes unit tests validating both the trait extraction accuracy and the string generation format.
Add experimental builder infrastructure for composable_kernel
- Add experimental/builder directory with README documentation.
- Create initial test infrastructure with CMakeLists.txt and placeholder test.
- Update root CMakeLists.txt to support CK_EXPERIMENTAL_BUILDER option.
- Update .gitignore to not treat `experimental/builder` as a CMake build directory.
This establishes the directory structure for a high-level builder pattern that will provide a semantically-clear interface for constructing CK operations, with initial focus on convolution kernels for MIOpen integration.