Commit Graph

2891 Commits

Author SHA1 Message Date
Ville Pietilä
c3a9044bad Dispatching for DeviceGroupedConvBwdWeightMultipleD_Wmma_CShuffle. 2026-01-02 09:25:38 -05:00
Ville Pietilä
1759db7250 Factory and tests for DeviceGroupedConvBwdWeight_Wmma_CShuffle. 2026-01-02 08:52:38 -05:00
Ville Pietilä
aa10d659b7 Added factory and tests for DeviceGroupedConvBwdWeightTwoStage_Wmma_CShuffleV3. 2026-01-02 07:51:40 -05:00
Ville Pietilä
89934275f4 Fix WMMA bwd weight tests. 2026-01-02 07:07:08 -05:00
Ville Pietilä
2e43e16e47 Update Readme. 2026-01-02 07:06:51 -05:00
Ville Pietilä
bc3cba873b Add factory and tests for DeviceGroupedConvBwdWeight_Wmma_CShuffleV3. 2026-01-02 06:08:29 -05:00
Ville Pietilä
d045923a0d Refactor large tensor support and WMMA configuration. 2026-01-02 05:12:19 -05:00
Ville Pietilä
09e188f2a8 Treat ref algorithm the same way as real algorithms in the dispatcher. 2026-01-02 02:28:05 -05:00
Ville Pietilä
5be1ed65fb Merge remote-tracking branch 'origin/develop' into vpietila/ckb-bwd-weight-factories 2026-01-02 02:15:24 -05:00
Ville Pietilä
6e8c401e33 [CK_BUILDER] Instance traits for conv bwd weight algorithms (#3498)
Added instance traits for the following bwd weight conv algorithms

DeviceGroupedConvBwdWeight_Xdl_CShuffleV3
DeviceGroupedConvBwdWeight_Wmma_CShuffleV3
DeviceGroupedConvBwdWeight_Wmma_CShuffle
DeviceGroupedConvBwdWeight_TwoStage_Xdl_CShuffle
DeviceGroupedConvBwdWeight_TwoStage_Wmma_CShuffleV3
DeviceGroupedConvBwdWeight_DL
DeviceGroupedConvBwdWeightMultipleD_Xdl_CShuffle
DeviceGroupedConvBwdWeightMultipleD_Wmma_CShuffleV3
Added also unit tests for instance traits of those bwd weigth algorithms that are currently exposed by the narrow CK build for MIOpen.
---------

Co-authored-by: Ville Pietilä <>
2025-12-31 15:41:15 -08:00
DarylHawkinsAMD
f3e4d46faa Temporarily disable kernel instances that won't build on gfx1101 on Windows (#3499)
## Proposed changes

This source file won't build for gfx1101 on Windows. It builds successfully on other gfx110X architectures, and also builds successfully on gfx1101 on Linux.

This is the compile error:

```
[composable_kernel] FAILED: library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight_bilinear/CMakeFiles/device_grouped_conv3d_bwd_weight_bilinear_instance.dir/wmma/device_grouped_conv3d_bwd_weight_wmma_bilinear_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp.obj 
[composable_kernel] ccache B:\build\core\clr\dist\lib\llvm\bin\clang++.exe -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_TILE_USE_WMMA=1 -DCK_TIME_KERNEL=1 -DCK_USE_WMMA -DCK_USE_XDL -DDPP_KERNELS -DLLVM_MAIN_REVISION=524190 -DUSE_PROF_API=1 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -IC:/home/runner/_work/TheRock/TheRock/ml-libs/composable_kernel/library/include -IC:/home/runner/_work/TheRock/TheRock/ml-libs/composable_kernel/include -IB:/build/ml-libs/composable_kernel/build/include -IB:/build/base/half/stage/include -isystem B:/build/core/clr/dist/include -DWIN32 -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_WARNINGS -DNOMINMAX -fms-extensions -fms-compatibility -D_ENABLE_EXTENDED_ALIGNED_STORAGE  -Wno-documentation-unknown-command -Wno-documentation-pedantic -Wno-unused-command-line-argument -Wno-explicit-specialization-storage-class -Wno-ignored-attributes -Wno-unknown-attributes -Wno-duplicate-decl-specifier --hip-path=B:/build/core/clr/dist --hip-device-lib-path=B:/build/core/clr/dist/lib/llvm/amdgcn/bitcode -O3 -DNDEBUG -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -std=c++20   -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Wno-missing-field-initializers -Wno-error=deprecated-declarations -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversion -Wno-double-promotion -Wno-exit-time-destructors -Wno-extra-semi -Wno-float-conversion -Wno-gnu-anonymous-struct -Wno-gnu-zero-variadic-macro-arguments -Wno-missing-prototypes -Wno-nested-anon-types -Wno-padded -Wno-return-std-move-in-c++11 -Wno-shorten-64-to-32 -Wno-sign-conversion -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-weak-vtables -Wno-covered-switch-default -Wno-unsafe-buffer-usage -Wno-unused-lambda-capture -Wno-nvcc-compat -Wno-c++20-compat -Wno-bit-int-extension -Wno-pass-failed -Wno-switch-default -Wno-unique-object-duplication -Wno-nrvo -Werror -Weverything -fcolor-diagnostics -x hip --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=gfx1103 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=gfx1103 -MD -MT library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight_bilinear/CMakeFiles/device_grouped_conv3d_bwd_weight_bilinear_instance.dir/wmma/device_grouped_conv3d_bwd_weight_wmma_bilinear_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp.obj -MF library\src\tensor_operation_instance\gpu\grouped_conv3d_bwd_weight_bilinear\CMakeFiles\device_grouped_conv3d_bwd_weight_bilinear_instance.dir\wmma\device_grouped_conv3d_bwd_weight_wmma_bilinear_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp.obj.d -o library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight_bilinear/CMakeFiles/device_grouped_conv3d_bwd_weight_bilinear_instance.dir/wmma/device_grouped_conv3d_bwd_weight_wmma_bilinear_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp.obj -c C:/home/runner/_work/TheRock/TheRock/ml-libs/composable_kernel/library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight_bilinear/wmma/device_grouped_conv3d_bwd_weight_wmma_bilinear_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp
[composable_kernel] error: Illegal instruction detected: Operand has incorrect register class.

[composable_kernel] V_CMP_NE_U32_e32 0, $src_private_base, implicit-def $vcc, implicit $exec

[composable_kernel] 1 error generated when compiling for gfx1101.
```

This appears to be a compiler bug and we'll follow up to get a proper fix landed, but for the purposes of landing some work to enable gfx1151 support in TheRock we'd like to disable building of these kernels on this architecture temporarily.

## Checklist

Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

- [X] I have added tests relevant to the introduced functionality, and the unit tests are passing locally
- [X] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more than 30 seconds to run.
- [x] I have added inline documentation which enables the maintainers with understanding the motivation
- [X] I have removed the stale documentation which is no longer relevant after this pull request
- [X] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
- [X] I have run `clang-format` on all changed files
- [X] Any dependent changes have been merged
2025-12-31 13:12:45 -07:00
Ville Pietilä
fba80401d1 Add factory for DeviceGroupedConvBwdWeightMultipleD_Xdl_CShuffle 2025-12-31 09:32:58 -05:00
Ville Pietilä
83be9c740c Add test for creating DeviceGroupedConvBwdWeightMultipleD_Xdl_CShuffle instance. 2025-12-31 09:08:06 -05:00
Ville Pietilä
e1b4acd431 Final implementation for bwd weight DL factory. 2025-12-31 08:29:11 -05:00
Ville Pietilä
3b0777f629 Conv bwd weight DL factory. 2025-12-31 07:38:52 -05:00
Ville Pietilä
75710202ab Added factory for DeviceGroupedConvBwdWeightTwoStage_Xdl_CShuffle. 2025-12-31 04:32:28 -05:00
kabrahamAMD
f86bbb1aef [CK_Builder] [testing] Integrate device random generators (#3427)
Implemented device random number generators for ck tensors.
Includes tests and integration to ck builder testing interface.
2025-12-30 10:03:05 -08:00
Bartłomiej Kocot
2b8302eb6d Fix grouped conv wrw kernels names (#3494) 2025-12-30 16:45:39 +01:00
ApoorvaKalyani
53a1e4f551 Grouped convolution backward data WMMA v3 implementation (#3460)
* Added device level implementation for bwd_data_wmma_v3.

* Added first instance of bwd_data_wmma_v3(f16).

* Add support for bwd data in gridwise implementation

Some changes are general for convolution and some are specific for bwd
data. We need to generalize them once we have fwd, bwd data and bwd
weight

* Initial device implementation of bwd data

* Remove unused template parameters in device impl

* Add one instance for different layout

initial check of device implementation

* Add tests for splitk and for different layouts

* Appended more instances to wmma_v3_f16.

* Added conv_2d bf16 wmma_v3 instances.

* Added conv_3d_bf16 wmma_v3_instances.

* Added conv_3d_f16_wmma_v3_instances.

* Added SplitN test cases for wmma.

* Conv3d_bwd_data_scale_wmma_v3 instances.

* Conv3d_bwd_data_bilinear_wmma_v3_instances

* Renaming the device level instances file to common name , since it is defined for different DataTypes.

* Renaming the instances and fixing typo

* Added the test cases to regression test list

* NCHW support for wmma_v3

* Examples for bf16 and f16 bwd_data_wmma_v3

* Added transpose conditons for device impl

* fixing bugs

* Added the gemm_args array implmentation

* WIP debug conv bwd

* fix splitk

* Grouped gemm fix

* Update CmakeLists with EOF

* Added more instances for tests

* Fixed the run time error in examples and removed 3d conv examples.

* Fixed a typo.

* Updated CmakeLists to removed the 3d convultion deleted files

* Added print error statements for unsupoorted argument

* Added the merge conflict related changes

* Fixed compilation error

* Fixed the InstanceFactory duplication error.

* Removed the print statements and added logs to Arg function

* All the merge conflict related errors resolved

* Added d_tensor tests.

* Added the missing example types of wmm_v3

* Merge error fix

* Corrected the instance name

* Reverted the bias relu change

* Revereted the transpose load local change

* Updated the regression test list with bwd_data_scale

* Revert "Revereted the transpose load local change"

This reverts commit 0b7281edb2bf008e407006690a00621174d9d19b.

* Revert "Merge error fix"

This reverts commit f3c85daa474b1b83d10c8a3ce077354e71d91a2b.

* Reverting the local change

* Added merge error fix

* Build error fix due to merge conflicts

* Added bias_relu example for wmma_v3

* Modified the main method in dtensor tests

* Updated the dtensor tests to pick all the shapes

* Updated the dtensor test shapes.

* Updated the mem operations in tests.

* Added reference func

* Fixed typos in device impl

* Added new header file and modified the include file for 3d tests

* Renamed the test file and added reference func call.

* clang format fix

* Added ignore params

* Modified device impl and tests

* Removed debug print statements and updated dtensor test shapes

* Fixing merge conflicts

* Fixing more merge conflicts

* Fixed copyrights

* Updated the tuned instances to bilinear and scale.

* Adding tuned instances to vanilla wmma_v3

* Removed all unused instances and modified test layouts.

* Cleaned up all instances , reverted back fwd fp16 instances and updated tuned fp16 instances.

* Fix clang format

* Updated tuned f16/-genric instances

* Formatting the instances file

* Fixed copyrights and clang issues

* Nonsense commit to force git to force

* Removed the transpose instances

* Added verified genric instances

* Fixing namespace errors

* Added todo for failing shapes

* Formatting instance file

* Fix instance list formatting

* Removing unnecessary formats

* Renamed the common file

* Unification of xdl and wmma bwd_data tests

* Updated Cmake

* Added all layout types and deleted code.

* Updated Cmake to add the condition to all tests.

---------

Co-authored-by: Enrico Degregori <enrico@streamhpc.com>
Co-authored-by: Anton Gorenko <anton@streamhpc.com>
Co-authored-by: kiefer <kiefer.van.teutem@streamhpc.com>
2025-12-30 16:25:08 +01:00
Ville Pietilä
30c10e2544 Build new instance traits unit tests but exclude WMMA for now. 2025-12-30 05:52:55 -05:00
Ville Pietilä
adfab9db7e Add unit tests for instance strings. 2025-12-30 05:52:15 -05:00
Ville Pietilä
3c1e2b0170 Add instance traits for bwd weight algorithms. 2025-12-30 04:29:38 -05:00
yadaish
dae85ead64 [CK_TILE] support split-k a16w4 gemm1 (#3389)
* initial version to support moe gemm1 split-k

* add missing args

* fix build warning

* update reference

* for split-k disable bias and weight

* remove debug log

* fix format

* fix div by zero errors

* fix cmake config

* update

* resolve conflicts

* remove useless changes

* reformat

* fix

* remove useless changes

* fix ci

---------

Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com>
Co-authored-by: root <root@smci355-ccs-aus-m01-25.cs-aus.dcgpu>
2025-12-29 23:05:35 +08:00
Ville Pietilä
3e16fa072f Test fix. 2025-12-29 09:53:47 -05:00
Ville Pietilä
ab88cee0eb Add instance traits for DeviceGroupedConvBwdWeight_Xdl_CShuffleV3. 2025-12-29 09:53:07 -05:00
Ville Pietilä
a83790e9da Build conv bwd weigth v3 instances successfully. 2025-12-29 09:30:58 -05:00
Ville Pietilä
80f44824f5 Add bwd weight XDL CShuffle V3 factory. 2025-12-29 09:12:14 -05:00
JH-Leon-KIM-AMD
a0acc83a72 [CK_BUILDER] Add GPU Reference Algorithm to CK Builder (#3381)
* [CK_BUILDER] Integrate GPU reference as ConvAlgorithm

Add GPU reference as a ConvAlgorithm specialization, enabling:
- Unified Builder API for reference and optimized kernels
- Future ckProfiler integration for validation
- First step toward numerical validation in Builder tests

Changes:
- Add ConvAlgorithmSpecialization::REFERENCE enum
- Add ConvAlgorithm_Reference struct
- Add IsReferenceAlgorithm concept
- Create 3 reference factories (Forward, BwdData, BwdWeight)
- Wire into conv_dispatcher
- Add proof-of-concept test (passing)

Test result: Can instantiate reference through Builder API

* Add GPU reference execution tests

- Reference kernel executes through Builder (459ms)
- Both reference and optimized can instantiate
- Tests passing

Next: Implement utilities for comparison

* Optimized Builder kernel execution works

- MakeArgument pattern implemented
- Builder-generated kernel executes successfully
- Tests passing (451ms execution)

Next: Add comparison

* VALIDATION COMPLETE: Builder == Reference

Builder-generated kernel output matches GPU reference!

Test: Validate_Optimized_vs_Reference_Forward_2D_FP16
Result: PASS ✓

This proves CK Builder generates correct code!

* Update to new Builder API

All tests passing

* Rename test file for clarity

test_builder_kernel_execution -> test_builder_kernel_validation

* Add all 3 directions support

- Forward, Backward Data, Backward Weight
- All reference factories working
- Dispatcher wired for all directions
- 9 tests passing

Tests:
- test_reference_execution: 3 tests (all directions)
- test_optimized_execution: 3 tests (all directions)
- test_builder_kernel_validation: 3 tests (fwd validated, bwd placeholders)

* Add backward direction support

- Backward data and weight dispatcher wiring
- Fix factories for new API
- All 3 directions tested
- 9 tests passing

* Refactor: Change IsReferenceAlgorithm from concept to consteval function

Address review feedback: Use consteval function in dispatcher instead of
concept, matching the pattern for other algorithms (Tile, XDL, WMMA, DL).

- Remove IsReferenceAlgorithm concept from conv_algorithm_concepts.hpp
- Add IsReferenceAlgorithm() consteval function to conv_dispatcher.hpp
- Update dispatcher to use function call: IsReferenceAlgorithm<T>()
- Remove redundant algorithm checks from reference factory requires clauses

All tests passing (9/9).

* Move Tile algorithm check outside direction block to support all directions

* Implement MakeInvokerPointer interface and add random input validation

- Implement full Argument/Invoker structs for old CK interface (not just nullptr)
- Refactor with reference_common.hpp to reduce code duplication
- Add random input validation tests: Builder vs direct GPU reference (all directions)
- Fix layout: GNHWC -> NHWGC to match reference kernel expectations
- All 12 tests pass with IDENTICAL results on random input

* Move ConvAlgorithm_Reference to test/impl/conv_algorithm_types.hpp

Keep types.hpp for data types only (enums), move algorithm descriptors
to conv_algorithm_types.hpp as suggested by review.

* Add static_assert to ensure reference factories only accept PassThrough operations

Reference implementation doesn't support fused elementwise operations.
Add compile-time validation to fail early with clear error message if
non-PassThrough operations are specified on input, weight, or output.

* Add InstanceTraits support for reference kernels

- Store SIGNATURE/ALGORITHM/VERSION in Instance for reflection
- Create shared ReferenceCommonTraits base for common properties
- Add 3 direction-specific InstanceTraits specializations in one file
- Include data type and layouts in instance_string output

* Remove optimized kernel validation tests from reference-only branch

* Use existing layout helper and organize reference tests

Use LayoutToCK from conv_tensor_layout.hpp and move reference InstanceTraits
test to validation folder.

* Merge develop branch

Fix DataType switch for new mixed precision types.

* Fix comment spacing for CI

* Convert IsReferenceAlgorithm from function to concept

* Add reference tests to CI smoke tests

* Consolidate 3 reference factories into single unified factory

---------

Co-authored-by: Ville Pietilä <188998872+vpietila-amd@users.noreply.github.com>
2025-12-29 16:11:08 +02:00
Ville Pietilä
277981bc9b Clean-up CK Tile builder tests. 2025-12-29 07:15:08 -05:00
Ville Pietilä
9926d942e9 Separate bwd weigth and bwd data tests into separate targets. 2025-12-29 07:03:55 -05:00
Kiefer van Teutem
88ae445580 Replace grouped conv bwd wei wmmaV3 bilin/scale bf16f32bf16 support with bf16bf16bf16 (#3470)
* Replace grouped convolution bwd weight wmma v3 bilinear and scale bf16f32bf16 support with bf16bf16bf16 support. Update tests.

* Tentative fix for bwd weight bilinear bf16bf16bf16, seems like the bilinear elementwise overload for this case (bf16, f32 accu, bf16) was wrong.
2025-12-29 12:58:29 +01:00
Ville Pietilä
52086b350a Fix smoke tests. 2025-12-29 05:44:51 -05:00
Ville Pietilä
3bd0f05081 Fix fwd conv builder tests. 2025-12-29 05:31:35 -05:00
Ville Pietilä
027d943b2f Update conv specialization enum. 2025-12-29 05:06:39 -05:00
Ville Pietilä
30a9686877 Update compiletime diagnostics to use the size type. 2025-12-29 04:56:22 -05:00
Ville Pietilä
8c80e005bd Introduve a common size type for concepts. 2025-12-29 04:53:19 -05:00
Ville Pietilä
ff2fdd8acc Improve concept diagnostics. 2025-12-29 04:26:06 -05:00
Yi DING
b0ea67e377 [CK_TILE] MX FLATMM Fix M Padding (#3489)
* Fix M Padding

* Fix tensor desc ele space size
2025-12-29 09:09:12 +08:00
joyeamd
a3916a8d16 enable f8 tests (#3488) 2025-12-27 00:21:56 -08:00
Yi DING
7ce532eac7 [CK_TILE] Align FMHA BWD Reference with Kernel Implementation (#3486) 2025-12-25 16:12:36 +08:00
Erwin Terpstra
e08efa551f [CK_TILE] Grouped gemm quant tensor layouts (#3414)
* feat: add RRR, CRR, CCR layouts for a/b quant grouped gemm tests and examples. Refactor example setup to improve compile time

* chore: split out bquant preshuffle test, and reduce tile size to 128 to temporarily solve slow compile times

* chore: set m/n warp tile to 16 as configurations with 32 seem to have some support problems

* fix: missing check for transposed load in bquant pipeline

* chore: lower unit test tensors dimensions a bit for faster tests

* chore: set grouped gemm example M/N warp tile to 16

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-12-24 23:01:23 -08:00
Illia Silin
14668a56e3 remove the LLVM_MAIN_REVISION usage (#3487) 2025-12-24 16:49:35 -08:00
Thrupti Raj Lakshmana Gowda
62a8ec155f [CK TILE ENGINE] CI configuration with basic cases (#3475)
* [CK TILE ENGINE] Adding GEMM BASIC TEST in Kenkins

* fix RUN_TILE_ENGINE_BASIC_TESTS name typo

* [CK Tile Engine] Updating basic CI

* Resolving merging issues

* Resolving merging issues

---------

Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
2025-12-24 10:45:56 -08:00
kensclin
7f68f3c4fa Enable padding blockscale for abquant (#3453)
* Enable padding blockscale for abquant

* run clang-format

* Reduce unnecessary testing

* remove cout
2025-12-24 09:12:40 -08:00
Po Yen Chen
1c3151963b [CK_TILE][FMHA] Add FP8 support for batch_prefill kernel (#3425)
* Add fp8bf16 support for batch_prefill

* Fix wrong scale_s re-compute logic in batch_prefill

* Fix wrong scale_s re-compute logic in fmha fwd

* Fix batch_prefill codegen error

* Remove no-longer used GetName() function

* Add fp8 logits=True instances

* Update CHANGELOG.md
2025-12-24 10:34:06 +08:00
jakpiase
c0797c1671 [CK_TILE] Minor splitk bugfix for gemms and conv (#3387)
* fix for splitk if splitk < grid

* add different splitk implementation

* minor bugfix for streamk gemm

* Add test

---------

Co-authored-by: Bartlomiej Kocot <barkocot@amd.com>
2025-12-24 00:10:13 +01:00
Ville Pietilä
77e10c7b08 Concept improvements. 2025-12-23 10:27:38 -05:00
Ville Pietilä
a1740c614b Refactor handing of GEMM-K batch template parameter in conv bwd weight factory. 2025-12-23 10:08:56 -05:00
Johannes Graner
e1381d6a71 [CK grouped gemm] Fix grouped gemm two stage HasMainK0BlockLoop (#3466)
* Re-enable two stage kernel

* Only disable on HasMainKBlockLoop mismatch

* Address PR comments
2025-12-23 11:33:09 +01:00
kabrahamAMD
4ce7d4c511 [ck_builder] add utility functions to convolution (#3459)
* reinstate conv_signature_utils.hpp

* added tests for elementwise operation getters

* add tests for getDataType functions

* added test for no data type specified

---------

Co-authored-by: Kevin Abraham <kevin.abraham@streamhpc.com>
2025-12-23 10:39:49 +01:00