composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-20 04:49:54 +00:00

Author	SHA1	Message	Date
assistant-librarian[bot]	a73a06fb1d	Merge commit 'aad4cf098511b3f58c5bd3c32e4534d438f7539c' into develop	2026-01-07 19:21:57 +00:00
Enrico Degregori	5a3fc30228	Wmma support for gemm_bias_add_reduce (#3316 ) * Add tests for gemm_bias_add_reduce * Initial working implementation * Generalize implementation of reduce epilogue * Add tests for all layouts * Add instances * Fix test archs * Fix xdl bug * Remove library/profiler duplications * Fix num_byted error profiler * Fix typos * Fix copyright [ROCm/composable_kernel commit: `aad4cf0985`]	2026-01-07 10:27:16 -08:00
Erwin Terpstra	2379b5e6e0	Implement grouped gemm fastgelu for RDNA4 (#3303 ) * Implement grouped gemm fastgelu for RDNA4 * chore: some cleanup and minor inconsistencies in grouped gemm profiler * chore: clarified logic and reporting of supported instance warnings [ROCm/composable_kernel commit: `f9c6ba0403`]	2026-01-07 10:20:44 -08:00
assistant-librarian[bot]	54e7d86ee2	Merge commit 'a7d6b1e7008c0b6e1af8a7d79389aefbdca4da65' into develop	2026-01-07 16:16:37 +00:00
John Shumway	a89756823c	Add unit test coverage for conversion to convolution traits (#3515 ) Our concept-base conversions are fragile and too complex. We want to refactor to straightforward functions for each intance trace class template. This change adds unit test coverage to make that refactoring safer. [ROCm/composable_kernel commit: `a7d6b1e700`]	2026-01-07 07:44:21 -08:00
Johannes Graner	acf98936bc	[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 ) * Disable kernel timing in tests * default time_kernel = false in old CK examples [ROCm/composable_kernel commit: `0a474aa62f`]	2026-01-07 16:30:57 +01:00
assistant-librarian[bot]	850997ff67	Merge commit 'e8cc75aefbe365750cf79c1188014325578941d8' into develop	2026-01-07 15:15:08 +00:00
BrianHarrisonAMD	edc3e4a870	Enable offload-compress for Windows if avaliable (#3521 ) [ROCm/composable_kernel commit: `e8cc75aefb`]	2026-01-07 07:05:03 -08:00
assistant-librarian[bot]	bb614ee8b2	Merge commit 'd7497d26948ca90d0224920472712e0f657fb744' into develop	2026-01-07 08:16:44 +00:00
Cong Ma	cdd9dafe6a	[CK TILE] Refactor function amd_buffer_load_invalid_element_return_zero (#3512 ) Refactor function amd_buffer_load_invalid_element_return_zero to avoid the inefficient ASM code generated by compiler. Compiler generates suboptimal assembly for ternary operator, causing excessive VGPR usage Tested compilers: - Rocm 7.0.1 - Rocm 7.1.1 Co-authored-by: Thomas Ning <Thomas.Ning@amd.com> [ROCm/composable_kernel commit: `d7497d2694`]	2026-01-07 00:05:56 -08:00
assistant-librarian[bot]	9ec8eac079	Merge commit 'aaa35f0bbfa45dadc4380ddd6e0224668ddb97b4' into develop	2026-01-06 21:12:56 +00:00
Khushbu Agarwal	c33704febc	[CK_Tile] Support for various group sizes Preshuffle quant for 2d block scale gemm (#3445 ) * formatted * formatted * formatting * formatting * formatting * [CK TILE GEMM] Refactor block_scale_gemm examples - Split cpp file to reduce building time - Support multiple GemmConfig * [CK TILE GEMM] Refactor block_scale_gemm examples - Update Readme * enable prefill shapes * [CK TILE GEMM] Refactor block_scale_gemm examples - Add support for rowcol and tensor GEMM operations * [CK TILE GEMM] Refactor block_scale_gemm examples - Update README * adding preshuffle quant as new parameter and its associated new files * remove debugging statements * adding test * enable preshuffle quant with permuteN * updating readme and correcponding gemmconfigs * updating cmake file * fixing CI failures for grouped quant gemm * debugging permuteN * debugging * debugging PermuteN * initial commit * resolving merge conflicts * adding test cases * initial commit with prints * debugging * fine-grained working * debugging medium grained * fixing the tile window * formatting * enabling prefill shapes * working prefill shapes * formatted * clean up * code cleanup * bug fix after merging with develop * clean up after merging with develop * added comments for the tile window and tile distribution encoding --------- Co-authored-by: Cong Ma <congma13@amd.com> Co-authored-by: Thomas Ning <Thomas.Ning@amd.com> Co-authored-by: Agarwal <khuagarw@ctr2-alola-login-03.amd.com> [ROCm/composable_kernel commit: `aaa35f0bbf`]	2026-01-06 12:46:59 -08:00
kyle-256	9489e197c3	[CKTILE] Support A/B Quantization in Blockscale Grouped Gemm (#3452 ) * update grouped_gemm blockwise kernel * update config * update kernel * update examples * remove test code for now * sync test files with origin/develop * update example * fix code lint * fix code-lint * update test code * run clang format * run pre-commit * update api [ROCm/composable_kernel commit: `76696ace44`]	2026-01-06 12:36:04 -08:00
kensclin	df198bd5af	[CK_TILE] add preshuffleB mode for ABQuant GEMM (#3495 ) * [CK_TILE] add preshuffleB mode for ABQuant GEMM * fix precommit error * use template method call for cvt_scale_to_fp32 * fix precommit error * add test code * fix precommit error * switch abquant gemmconfig to default * Add changelog.md * fix precommit error * fix conflict [ROCm/composable_kernel commit: `2309c86054`]	2026-01-06 12:35:01 -08:00
assistant-librarian[bot]	05b2660bf1	Merge commit '960ef551bf5d615d45e31b954e0faff147e76c85' into develop	2026-01-06 19:12:05 +00:00
John Shumway	946a6e7df0	Fix build error from extra comma (#3516 ) The newer rocm compiler gives an error with a trailing comma in testing::AllOf. [ROCm/composable_kernel commit: `960ef551bf`]	2026-01-06 11:08:54 -08:00
assistant-librarian[bot]	38f334d882	Merge commit '2ffbf7f476d99b6fc3db71480b49d221c602e071' into develop	2026-01-06 18:17:10 +00:00
Illia Silin	acb2292b46	add tabulate package to aiter docker (#3519 ) [ROCm/composable_kernel commit: `2ffbf7f476`]	2026-01-06 09:36:54 -08:00
assistant-librarian[bot]	5d0010c4b9	Merge commit '1c433c64ec5254d202b7cbf4b8b0e98678ea2a4f' into develop	2026-01-06 09:16:30 +00:00
Robin Voetter	ffc30531ac	[CK_BUILDER] Integrate reference conv with testing (#3511 ) * ck-builder: explicitly delete forward declarations Before, these functions were seen as a forward declaration for an existing function. If no actual implementation overload could be found, these would be selected and a linker error or warning would be generated. By marking these functions as explicitly deleted, they incorrect invocations are generated as compile error instead. * ck-builder: ckt::run plumbing for reference conv This implements the ckt::run plumbing for the reference convolution implementation and sets up the first complete end-to-end test. * ck-builder: make validation system check for all-zeros When both the actual and reference output are both all zero bits, there is probably something wrong in the test framework. * ck-builder: proper implementation+tests for TensorDescriptor::is_packed * ck-builder: fix typos [ROCm/composable_kernel commit: `1c433c64ec`]	2026-01-06 09:29:06 +01:00
assistant-librarian[bot]	2285a8345a	Merge commit 'b78563b3d3edf1b2cd686ff0c0994ca2538419ef' into develop	2026-01-06 08:16:41 +00:00
joyeamd	e36567f015	Merge some updates for ck_tile headers (#3342 ) * fix some issues from internal branch * update cshuffle_epilogue * update cshuffle_epilogue * update cshuffle * update warp_gemm [ROCm/composable_kernel commit: `b78563b3d3`]	2026-01-05 23:39:00 -08:00
assistant-librarian[bot]	3f746f7294	Merge commit '2b563ad04828c5c970f7544d49831f33203587fb' into develop	2026-01-05 22:13:10 +00:00
joyeamd	9516169aaf	Joye/revise wp pipeline (#3493 ) * [CK_TILE] unify double and single lds implementation (#108) Unify LDS buffer management API for single and double buffering modes This change consolidates the Local Data Store (LDS) buffer management by: Merging single and double LDS buffer APIs into a unified interface Implementing ping-pong address calculation in pipeline when double LDS is enabled Computing pong buffer addresses dynamically using base address offsets --------- Co-authored-by: joye <joye@amd.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * update wp_pipeline * fix a c++17 issue * update for ci errors * fix ci issues * include a header to fix ci errors * fix some rebase issues * update with rebase --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> [ROCm/composable_kernel commit: `2b563ad048`]	2026-01-05 13:49:26 -08:00
assistant-librarian[bot]	3ee7a7765f	Merge commit '1224bc0a82fbf47e1452bc4dbd63371471e57d4a' into develop	2026-01-05 18:17:32 +00:00
Estevan Vedovelli	604ba0e9cf	Add support to gfx1153 and fix gfx115X WMMA config (#3496 ) * Support for gfx115X * Changes for gfx115X * Add gfx1153 * Update changelog --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `1224bc0a82`]	2026-01-05 10:03:30 -08:00
Bartłomiej Kocot	502914e556	Fix large tensor grouped conv bwd data test (#3513 ) [ROCm/composable_kernel commit: `bbf0b1a3b3`]	2026-01-05 09:42:02 -08:00
assistant-librarian[bot]	7ef22db454	Merge commit 'e6e7dc29101bcd8a5d30ae99adf71a09fa544b09' into develop	2026-01-05 13:26:54 +00:00
Robin Voetter	14a149bab6	[CK_BUILDER] validation (#3471 ) This pull request builds on #3267 by proving the "validation" infrastructure, the means to compare a set of `Outputs`. The design of the validation infrastructure is relatively straight forward: - Each SIGNATURE should come with a `validate()` implementation, which should be implemented in a similar way that the other functions/types from `testing.hpp` are implemented. - `validate()` returns a `ValidationReport`, which is a structure that keeps all relevant information about comparing the tensors from two `Outputs`. Note that crucially, `validate()` should not do any reporting by itself. Rather, glue logic should be implemented by the user to turn `ValidationReport` into a relevant error message. - You can see this clue code for CK-Builder itself in `testing_utils.hpp`, its `MatchesReference()`. This functionality is relatively barebones right now, it will be expanded upon in a different PR to keep the scope of this one down. The comparison is done on the GPU (using an atomic for now), to keep tests relatively quick. Some notable items from this PR: - To help compare the tensors and with writing tests, I've written a generic function `tensor_foreach` which invokes a callback on every element of a tensor. - For that it was useful that the `TensorDescriptor` has a rank which is known at compile-time, so I've changed the implementation of `TensorDescriptor` for that. I felt like it was a better approach than keeping it dynamic, for multiple reasons: - This is C++ and we should use static typing where possible and useful. This way, we don't have to implement runtime assertions about the tensor rank. - We know already know the rank of tensors statically, as it can be derived from the SIGNATURE. - It simpifies the implementation of `tensor_foreach` and other comparison code. - There are a lot of new tests for validating the validation implementation, validating validation validation tests (Only 3 recursive levels though...). For a few of those functions, I felt like it would be useful to expose them to the user. - Doc comments everywhere. [ROCm/composable_kernel commit: `e6e7dc2910`]	2026-01-05 04:57:34 -08:00
assistant-librarian[bot]	85642a59c2	Merge commit 'cc75a1dc5f18613af29d8821375f79b0f3c6410b' into develop	2026-01-05 11:13:10 +00:00
Jeff Huang	4f3995a3e3	[FMHA] Batch Prefill Support Improvements: Change KV Cache Layout & Large Page Size Support (#3442 ) * add page_block_size parameter * add is_sglang_layout to parameters * add kv_offset_array_transform to batch async for page size 16 * add kv_last_page_lens to kernel * change kv layout to [num_total_pages, page_block_size, hdim] * format * - enable codegen of batch_prefill kernels - create new problem struct BlockFmhaBatchPrefillPipelineProblem for batch prefill kernels - generate different page sizes of batch prefill kernels (1, 16) * 1. fix wrong calculation of page id in kv_offset_array_transform in gfx950 2. support page size 1024 * fix python format * change kv cache layout to [num_blocks, num_kv_heads, head_size/x, block_size, x] and [num_blocks, num_kv_heads, block_size/X, head_size, X] * 1. Introduced `kVectorSize` in BlockFmhaBatchPrefillPipelineProblem instead of using hardcode values 2. Makes batch prefill kernel traits structures inherent from fmha fwd traits 3. Add some static check for Page size, vector size, hdim, ..., etc. * [Refactor] Replace is_sglang_layout with Enums for KV cache configuration Refactored `fmha_batch_prefill` to use `BlockAttentionKVCacheMemoryLayoutEnum` (VECTORIZED/LINEAR) and `BlockAttentionKVCacheLookupTableEnum` (SGLANG_1D/VLLM_2D) instead of a single boolean. Changes: * Added Enum definitions in `block_attention_kvcache_layout_enum.hpp`. * Updated Kernel, Pipeline, and Traits to template on these Enums. * Implemented `kv_offset_array_transform` logic based on `kKVMemoryLayout`. * Refactored `PageBlockTableKargs` to adapt to `kKVLookupTable`. * Updated CodeGen scripts to support new parameters. This decouples memory layout from the paging mechanism, enabling flexible KV cache configurations. * 1. remove batch prefill pipeline with sk_pad=false 2. correct some comments 3. add static assert to make sure v offsets is in same page within a tile. * fix vgpr spill count * remove unnecessary t2s functions * add fp8 support for receipt 200 and 600 in fmha_bath_prefill.py * support linear kv cache layout * Remove block_table_ptr from fwd_batch_prefill_args. Instead, reuse kv_page_indices as a pointer of the lookup table. * 1. merge multiple transforms into single transform. 2. add static check to make sure vlayout is row-major. * move FmhaFwdCommonKargs::seqlen_k_ptr to VllmPageTableKargs. * update changelog --------- Co-authored-by: ltqin <letaoqin@amd.com> Co-authored-by: PoYen, Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `cc75a1dc5f`]	2026-01-05 18:41:47 +08:00
assistant-librarian[bot]	c5abac7854	Merge commit 'e339101e9c9961fe1bc8305d5c316b39d1980d3e' into develop	2026-01-04 12:20:15 +00:00
Max Podkorytov	6cf89bbca9	[CK-Tile] move out memory operation from cshuffle epilogue class (#3359 ) * initial poc * factor out common parts in operator() * cv4 * rest of the universal gemm pipelines * fix test * remove boilerplate from tile engine * fix example * fix example * format * fix tests build for gemm * remove base pipeline codegen from gemm instance builder * unify v3 logic with the rest of universal gemm pipelines * fix build for multi abd test * fix test gemm multi d * fix build for weight preshuffle * fix grouped gemm test * fix grouped gemm multi d test * fix grouped gemm preshuffle * fix grouped gemm example except for quant * fix gemm preshuffle * fix splitk 2 stage example * fix batched gemm example * fix multid example * fix multiabd example * fix batched gemm test * fixup * fix examples build * fix grouped gemm test build * fix smoke builder * hacky poc * fix tile engine * kill the lambda * maybe fix test build * more fixes * clang-format * save temp * clang-format * mostly fix examples * clang-format * remove dead code * more cleanup * fix fmha bwd build (default epilogue set/add appears to be broken) * fix default epilogue tests but not correctness * clang-format * fix bquant * clang-format * cleanup dead code * rearrange make windows for readability * restore changes to IsSupportedArgument * fix smoke-builder * clang-format * fixup rename class * build fixes * clang-format * fix builder * fixup * remove set from builder tests * fix test * clang-format * re-refactor the kernels * clang-format * fix header license * remove memory operation from conv bwd test * clang-format * clang-format example,include * clang-format test * build fixes * clang-format * solve compilation error * fix the CI * solve compilation error * clang format * solve merge conflict * solve merge conflict * solve the gfx11 error * solve test error * moar build fixes * remove AtomicAddRequiresKBatchGreaterThanOne test since the property is removed from the kernel scope --------- Co-authored-by: Thomas Ning <Thomas.Ning@amd.com> [ROCm/composable_kernel commit: `e339101e9c`]	2026-01-04 03:28:14 -08:00
assistant-librarian[bot]	d8734393c7	Merge commit 'ec23be0b9d45ff9ca4135090bcd0269184c953a7' into develop	2026-01-03 06:16:07 +00:00
John Afaganis	077d75cea0	Update unsigned long literals and format specifiers to work correctly in Windows (#3483 ) Previously, the code used unsigned long for literals and format specifiers to represent 64-bit unsigned values. While this worked on Linux, it caused compatibility issues on Windows. The C++ standard does not guarantee that long is 64 bits. On LP64 systems (e.g., Linux), long maps to 64-bit values, but on LLP64 systems (e.g., Windows), long maps to 32-bit values. This discrepancy led to incorrect behavior when assuming unsigned long was always 64-bit. This commit updates all relevant literals and format specifiers to explicitly use 64-bit unsigned types, ensuring consistent behavior across platforms. [ROCm/composable_kernel commit: `ec23be0b9d`]	2026-01-02 22:16:41 -07:00
assistant-librarian[bot]	0b05cd0351	Merge commit '4670df5ca606e6e3ee07a085ea61016489bf91ad' into develop	2026-01-03 01:41:33 +00:00
John Shumway	9e9cadefb5	[CK_BUILDER] Remove cmath include (#3508 ) Remove the dependency from device_tensor_generator.hpp and fix a typo from a previous force push. The changes replace standard library math functions with their ck::math equivalents and define PI as a local constant instead of computing it using std::acos. Key changes: * Removed #include header dependency * Replaced std::acos(-1.0) with hardcoded PI constant 3.141592653f * Replaced std::sqrt, std::cos, and std::sin with ck::math equivalents [ROCm/composable_kernel commit: `4670df5ca6`]	2026-01-02 16:58:35 -08:00
assistant-librarian[bot]	e64da4f3d6	Merge commit '355ce9230d9c4f2e74776e879f2bee71a26bae4a' into develop	2026-01-02 23:12:46 +00:00
John Shumway	853f3c6776	Remove non-standard M_PI (#3507 ) Just use PI=acos(-1.0) as a local static constexpr. This has been causing build issues on windows. [ROCm/composable_kernel commit: `355ce9230d`]	2026-01-02 14:21:46 -08:00
assistant-librarian[bot]	1b3eb980bf	Merge commit '1da340031c98bfde0f142bf34493d087490ec70d' into develop	2026-01-02 21:11:42 +00:00
John Shumway	86b1f5749b	Enable math defines for MSVC. (#3503 ) The symbol M_PI is breaking the build on Windows. The _USE_MATH_DEFINES macro enables M_PI and other math constants on Windows. (I'm guessing this is more idomatic than the old trick of using PI=acos(-1.0).) https://learn.microsoft.com/en-us/cpp/c-runtime-library/math-constants?view=msvc-170 Co-authored-by: BradPepersAMD <Brad.Pepers@amd.com> [ROCm/composable_kernel commit: `1da340031c`]	2026-01-02 14:36:42 -05:00
Joseph Macaranas	506a19a7e7	Update TheRock CI SHA 20260102 (#3506 ) - TheRock CI compilation passed with the changes. [ROCm/composable_kernel commit: `cc1392a405`]	2026-01-02 14:23:43 -05:00
assistant-librarian[bot]	6b6bc88064	Merge commit '6e8c401e33676ccc21992c849e73640a383d288c' into develop	2026-01-01 00:43:10 +00:00
Ville Pietilä	ba9dbd433a	[CK_BUILDER] Instance traits for conv bwd weight algorithms (#3498 ) Added instance traits for the following bwd weight conv algorithms DeviceGroupedConvBwdWeight_Xdl_CShuffleV3 DeviceGroupedConvBwdWeight_Wmma_CShuffleV3 DeviceGroupedConvBwdWeight_Wmma_CShuffle DeviceGroupedConvBwdWeight_TwoStage_Xdl_CShuffle DeviceGroupedConvBwdWeight_TwoStage_Wmma_CShuffleV3 DeviceGroupedConvBwdWeight_DL DeviceGroupedConvBwdWeightMultipleD_Xdl_CShuffle DeviceGroupedConvBwdWeightMultipleD_Wmma_CShuffleV3 Added also unit tests for instance traits of those bwd weigth algorithms that are currently exposed by the narrow CK build for MIOpen. --------- Co-authored-by: Ville Pietilä <> [ROCm/composable_kernel commit: `6e8c401e33`]	2025-12-31 15:41:15 -08:00
assistant-librarian[bot]	21391d4406	Merge commit 'f3e4d46faa5f3ce4d81c86121782d8a9aea27c5e' into develop	2025-12-31 20:13:22 +00:00
DarylHawkinsAMD	67b61ccf5c	Temporarily disable kernel instances that won't build on gfx1101 on Windows (#3499 ) ## Proposed changes This source file won't build for gfx1101 on Windows. It builds successfully on other gfx110X architectures, and also builds successfully on gfx1101 on Linux. This is the compile error: ``` [composable_kernel] FAILED: library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight_bilinear/CMakeFiles/device_grouped_conv3d_bwd_weight_bilinear_instance.dir/wmma/device_grouped_conv3d_bwd_weight_wmma_bilinear_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp.obj [composable_kernel] ccache B:\build\core\clr\dist\lib\llvm\bin\clang++.exe -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_TILE_USE_WMMA=1 -DCK_TIME_KERNEL=1 -DCK_USE_WMMA -DCK_USE_XDL -DDPP_KERNELS -DLLVM_MAIN_REVISION=524190 -DUSE_PROF_API=1 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -IC:/home/runner/_work/TheRock/TheRock/ml-libs/composable_kernel/library/include -IC:/home/runner/_work/TheRock/TheRock/ml-libs/composable_kernel/include -IB:/build/ml-libs/composable_kernel/build/include -IB:/build/base/half/stage/include -isystem B:/build/core/clr/dist/include -DWIN32 -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_WARNINGS -DNOMINMAX -fms-extensions -fms-compatibility -D_ENABLE_EXTENDED_ALIGNED_STORAGE -Wno-documentation-unknown-command -Wno-documentation-pedantic -Wno-unused-command-line-argument -Wno-explicit-specialization-storage-class -Wno-ignored-attributes -Wno-unknown-attributes -Wno-duplicate-decl-specifier --hip-path=B:/build/core/clr/dist --hip-device-lib-path=B:/build/core/clr/dist/lib/llvm/amdgcn/bitcode -O3 -DNDEBUG -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -std=c++20 -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Wno-missing-field-initializers -Wno-error=deprecated-declarations -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversion -Wno-double-promotion -Wno-exit-time-destructors -Wno-extra-semi -Wno-float-conversion -Wno-gnu-anonymous-struct -Wno-gnu-zero-variadic-macro-arguments -Wno-missing-prototypes -Wno-nested-anon-types -Wno-padded -Wno-return-std-move-in-c++11 -Wno-shorten-64-to-32 -Wno-sign-conversion -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-weak-vtables -Wno-covered-switch-default -Wno-unsafe-buffer-usage -Wno-unused-lambda-capture -Wno-nvcc-compat -Wno-c++20-compat -Wno-bit-int-extension -Wno-pass-failed -Wno-switch-default -Wno-unique-object-duplication -Wno-nrvo -Werror -Weverything -fcolor-diagnostics -x hip --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=gfx1103 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=gfx1103 -MD -MT library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight_bilinear/CMakeFiles/device_grouped_conv3d_bwd_weight_bilinear_instance.dir/wmma/device_grouped_conv3d_bwd_weight_wmma_bilinear_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp.obj -MF library\src\tensor_operation_instance\gpu\grouped_conv3d_bwd_weight_bilinear\CMakeFiles\device_grouped_conv3d_bwd_weight_bilinear_instance.dir\wmma\device_grouped_conv3d_bwd_weight_wmma_bilinear_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp.obj.d -o library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight_bilinear/CMakeFiles/device_grouped_conv3d_bwd_weight_bilinear_instance.dir/wmma/device_grouped_conv3d_bwd_weight_wmma_bilinear_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp.obj -c C:/home/runner/_work/TheRock/TheRock/ml-libs/composable_kernel/library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight_bilinear/wmma/device_grouped_conv3d_bwd_weight_wmma_bilinear_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp [composable_kernel] error: Illegal instruction detected: Operand has incorrect register class. [composable_kernel] V_CMP_NE_U32_e32 0, $src_private_base, implicit-def $vcc, implicit $exec [composable_kernel] 1 error generated when compiling for gfx1101. ``` This appears to be a compiler bug and we'll follow up to get a proper fix landed, but for the purposes of landing some work to enable gfx1151 support in TheRock we'd like to disable building of these kernels on this architecture temporarily. ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask. - [X] I have added tests relevant to the introduced functionality, and the unit tests are passing locally - [X] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run. - [x] I have added inline documentation which enables the maintainers with understanding the motivation - [X] I have removed the stale documentation which is no longer relevant after this pull request - [X] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request - [X] I have run `clang-format` on all changed files - [X] Any dependent changes have been merged [ROCm/composable_kernel commit: `f3e4d46faa`]	2025-12-31 13:12:45 -07:00
assistant-librarian[bot]	14c7b15fc2	Merge commit 'f86bbb1aefdd047b2b0e886dda831417e790f622' into develop	2025-12-30 18:15:59 +00:00
kabrahamAMD	d7f7c1b6db	[CK_Builder] [testing] Integrate device random generators (#3427 ) Implemented device random number generators for ck tensors. Includes tests and integration to ck builder testing interface. [ROCm/composable_kernel commit: `f86bbb1aef`]	2025-12-30 10:03:05 -08:00
assistant-librarian[bot]	ee8ec5af8d	Merge commit '2b8302eb6d2217c0f537c28538265f4003ec416e' into develop	2025-12-30 16:14:01 +00:00
Bartłomiej Kocot	c5245882c3	Fix grouped conv wrw kernels names (#3494 ) [ROCm/composable_kernel commit: `2b8302eb6d`]	2025-12-30 16:45:39 +01:00

... 6 7 8 9 10 ...

3949 Commits