* Fixing check for changed files
* Testing CI skip behavior
* Testing CI Trigger
This should skip CI
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
[ROCm/composable_kernel commit: 775b96ea6a]
* add tensorwise quant in grouped gemm
* fix example issue
* update test cases
* format codes
* clang format
* use GTEST_FAIL
* fix a bug in test_grouped_gemm_util
* skip test when use wmma on grouped_quant kernel
* change cmake
* change code based on comments
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com>
[ROCm/composable_kernel commit: 3c12a02827]
* Disable c++20-compat warnings when building old CK in C++20 mode
Turns out that this creates some warnings for no good reason.
* ck-builder: add missing layouts and element-wise op names
For layouts, we can directly use the ::name attribute, which should
cover all layouts. For element-wise ops, I just added the ones which
are currently missing when compiling CK with -DMIOPEN_REQ_LIBS_ONLY.
[ROCm/composable_kernel commit: d0364641ed]
* Conv bwd splitN support
* Adjust splitting calculations to lengths format
* Prepare indexing for future splitK support
[ROCm/composable_kernel commit: cbd1279ae6]
* Introduce tree reduction for BlockReduce2dCrossWarpSync
* Rename original impl to BlockReduce2dLinearCrossWarpSync
* Replace warp_size with get_warp_size()
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
[ROCm/composable_kernel commit: 5a27a97391]
* [CK_BILDER] Add compile-time reflection for a convolution instance
Introduce InstanceTraits template metaprogramming framework to enable runtime introspection of device kernel template parameters without requiring implementation knowledge. This reflection system extracts configuration details (block sizes, data types, layouts, tuning parameters) directly from kernel specializations through template
pattern matching. In particular, the GetInstanceString method returns a string that uniquely idenitfies the kernel, by explicitly serializing all template paramter values.
This provides critical functionality for MIOpen integration, since the existing GetTypeString method is ambiguous, and only captures some of the template paramters.
The implementation uses a two-level design: a primary InstanceTraits template declaration in instance_traits.hpp serves as the interface, while kernel-specific specializations (e.g., for DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3) provide the actual extraction logic. This separation allows the reflection system to scale to additional kernel types without modifying the core interface.
Key architectural decisions:
- Forward-declare device kernels in instance_traits.hpp to avoid circular dependencies, since device implementation headers will include the reflection headers
- Use compile-time constants and type aliases to expose kernel parameters, enabling zero-overhead introspection
- Provide a templated instance_string() function that generates human-readable kernel configuration strings by serializing all template parameters in order, useful for debugging and kernel identification
- Guard reflection integration with preprocessor definition CK_EXPERIMENTAL_BUILDER to keep it opt-in until the API stabilizes
- Add GetInstanceString() virtual method to BaseOperator, allowing runtime polymorphic access to compile-time kernel information
This infrastructure also enables upcoming higher-level semantic reflection abstractions (like ConvTraits) to query kernel configurations programmatically.
Includes unit tests validating both the trait extraction accuracy and the string generation format.
[ROCm/composable_kernel commit: 37dff024c1]
fix transpose_vectors logic for 2x2 8-bit tiles
add a test which goes through this code path.
factor out constexpr'd cases into smaller functions.
add inline docs about the data movement
impact: gemms with 8-bit non-rcr inputs on gfx942
[ROCm/composable_kernel commit: 2570462ecf]
* Reading gpuname from target for gemm in ck tile engine
* Reading gpuname from target for gemm preshuffle in ck tile engine
* Reading gpuname from target for gemm preshuffle in ck tile engine
* Get GPU changes for GEMM Muti D in TILE ENGINE
* Addressing errors for gpu name in cktileengine
[ROCm/composable_kernel commit: 9f77061094]
Add experimental builder infrastructure for composable_kernel
- Add experimental/builder directory with README documentation.
- Create initial test infrastructure with CMakeLists.txt and placeholder test.
- Update root CMakeLists.txt to support CK_EXPERIMENTAL_BUILDER option.
- Update .gitignore to not treat `experimental/builder` as a CMake build directory.
This establishes the directory structure for a high-level builder pattern that will provide a semantically-clear interface for constructing CK operations, with initial focus on convolution kernels for MIOpen integration.
[ROCm/composable_kernel commit: f18b79f328]