These types can be used to check whether a list of instances
defined statically/obtained from a DeviceOp is congruent. We don't
care about the order, just which items are present or not. Note that
when GetInstanceString() is not implemented for a particular kernel,
it returns "" (via the base class implementation). When this is the
case, tests should temporarily add "" to the expected list of
instances until the implementation is done.
These types can be used to check whether a list of instances
defined statically/obtained from a DeviceOp is congruent. We don't
care about the order, just which items are present or not. Note that
when GetInstanceString() is not implemented for a particular kernel,
it returns "" (via the base class implementation). When this is the
case, tests should temporarily add "" to the expected list of
instances until the implementation is done.
We need to check all the architectures for build errors. This missing tf32 type came up as a build failure when I compiled for different instinct architectures.
* Add name member to unary elementwise ops.
* Update elementwise_op_name to check for name attribute.
* Require that the layout is derived from BaseTensorLayout struct.
Adds new testing functionality: an inline diff for string comparison.
Example usage:
EXPECT_THAT("Actual string", ck_tile::test::StringEqWithDiff("Expected string"));
Failure message:
Value of: "Actual string"
Expected: "Expected string"
Actual: "Actual string" (of type char [14]),
Diff: "[Expe|A]ct[ed|ual] string"
The inline-diff function uses the Wagner-Fischer algorithm to find the minimum edit distance and generate diff markers, which has O(N^2) complexity. It has optional color codes that are enabled with the matcher.
* Fixing check for changed files
* Testing CI skip behavior
* Testing CI Trigger
This should skip CI
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
* add tensorwise quant in grouped gemm
* fix example issue
* update test cases
* format codes
* clang format
* use GTEST_FAIL
* fix a bug in test_grouped_gemm_util
* skip test when use wmma on grouped_quant kernel
* change cmake
* change code based on comments
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com>
* Disable c++20-compat warnings when building old CK in C++20 mode
Turns out that this creates some warnings for no good reason.
* ck-builder: add missing layouts and element-wise op names
For layouts, we can directly use the ::name attribute, which should
cover all layouts. For element-wise ops, I just added the ones which
are currently missing when compiling CK with -DMIOPEN_REQ_LIBS_ONLY.
* Introduce tree reduction for BlockReduce2dCrossWarpSync
* Rename original impl to BlockReduce2dLinearCrossWarpSync
* Replace warp_size with get_warp_size()
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
* [CK_BILDER] Add compile-time reflection for a convolution instance
Introduce InstanceTraits template metaprogramming framework to enable runtime introspection of device kernel template parameters without requiring implementation knowledge. This reflection system extracts configuration details (block sizes, data types, layouts, tuning parameters) directly from kernel specializations through template
pattern matching. In particular, the GetInstanceString method returns a string that uniquely idenitfies the kernel, by explicitly serializing all template paramter values.
This provides critical functionality for MIOpen integration, since the existing GetTypeString method is ambiguous, and only captures some of the template paramters.
The implementation uses a two-level design: a primary InstanceTraits template declaration in instance_traits.hpp serves as the interface, while kernel-specific specializations (e.g., for DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3) provide the actual extraction logic. This separation allows the reflection system to scale to additional kernel types without modifying the core interface.
Key architectural decisions:
- Forward-declare device kernels in instance_traits.hpp to avoid circular dependencies, since device implementation headers will include the reflection headers
- Use compile-time constants and type aliases to expose kernel parameters, enabling zero-overhead introspection
- Provide a templated instance_string() function that generates human-readable kernel configuration strings by serializing all template parameters in order, useful for debugging and kernel identification
- Guard reflection integration with preprocessor definition CK_EXPERIMENTAL_BUILDER to keep it opt-in until the API stabilizes
- Add GetInstanceString() virtual method to BaseOperator, allowing runtime polymorphic access to compile-time kernel information
This infrastructure also enables upcoming higher-level semantic reflection abstractions (like ConvTraits) to query kernel configurations programmatically.
Includes unit tests validating both the trait extraction accuracy and the string generation format.