composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-19 02:01:01 +00:00

Author	SHA1	Message	Date
Johannes Graner	3b8e9864c6	[CK_TILE] Add conv fwd + bias + clamp example (#3012 ) * Implement argument passing to element-wise functions for fwd convolution * Add files for fwd + bias + clamp example * Implement Bias * Implement Clamp * Elementwise function composition * Composition unit test * Implement fwd + bias + clamp example * Simplify argument passing and composition * elfunc -> bias_and_clamp * Rename function to specify example * Move element-wise function instantiation to kernel * Make bias a runtime tensor * No ugly namespace aliasing * Initialize element-wise function on host * Remove function initialization helper, simplify Compose initialization * Remove unintended LSP compatibility patch * Clean up includes and unused code * Switch names in cshuffle epilogue * Move CDElementwise to conv traits * Re-add required include * Initialize bias in same way as other tensors * Better type specification for ds pointer * Disable 1D convolution * Add warning for non-group-constant bias [ROCm/composable_kernel commit: `5c1974065e`]	2025-10-27 18:43:09 +01:00
arai713	cbf24c87c6	[CK_TILE] Stream-K operator() Reboot (#3064 ) * Persistent Stream-K Kernel Implementation This change implements an operator() function in the reboot::StreamKKernel class that is enabled when the Persistent flag is set to true. In this case, the data-parallel portion and the Stream-K portion of the kernel are fully persistent. The changes were made in the reboot namespace. A future PR will remove the old Stream-K kernel class and remove the reboot namespace. * Unit Tests for Persistent Stream-K Kernel This change contains the inital test suite for the Persitent Stream-K Kernel. The files contain "reboot" in the name; a future PR will remove tests for the old Stream-K Kernel and remove the "reboot" naming. A future commit will add tests for the non-persistent kernel. Also added estimate_num_wgs_per_tile to the StreamKTilePartitionerBase class. This allows us to estimate the number of accumulations done per macro tile in C to use during validation when computing relative and absolute tolerance. * Adding implementation for the Non-Persistent Stream-K kernel This code is adding the operator() function for the Non-Persistent Stream-K kernel. Persistency of the kernel is determined through a template argument. The Non-Persistent kernel will allocate additional workgroups for the data parallel section, leading to a different structure for processing the data parallel and Stream-K sections. There has been an addition to the TilePartitioner to get access to the whether Persistent has been set to true or false in the StreamKKernel. * Adding in the tests for the Non-Persistent Stream-K kernel * Refactor Stream-K Reboot Unit Tests This commit makes the following changes: - Update test cases to determine M, N, and K based on the number of CUs. This ensures that each test case is one of Edge Case, SK Only, DP Only, or DP + 2 Tile SK regardless of the architecture. - Since the DP + 2 Tile SK test case takes long to run, this change moves this case into a separate .inc file and labels it as an extended test. - Since the extended test takes > 30 seconds to run, this test is added to the list of regression tests. * Fix spelling errors in comments for test cases Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Changes based on review Removed const volatile for typenames Set up alias for is_tuple_t Naming changes for clarity: GemmCommon -> BaseGemm Moved std::enable_if_t out of template parameters and changed to a return type for operator() Added constructor for StreamKKernelArgs to clarify UniversalGemm inheritance --------- Co-authored-by: Emily Martins <emily.martins@amd.com> Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> [ROCm/composable_kernel commit: `054fdb765c`]	2025-10-27 09:14:17 -07:00
John Shumway	facd83876e	Add .cline* files to .gitignore (#3101 ) Developers who use cline on the code base need to ignore .cline* directories like .cline_storage and .clinerules. Using a wildcard to ignore any other cline-related directories. [ROCm/composable_kernel commit: `0b68423015`]	2025-10-27 08:29:15 -07:00
Enrico Degregori	b0c0571809	Fix multi-abd tests bug (#3099 ) [ROCm/composable_kernel commit: `06973b1cf4`]	2025-10-27 08:09:02 -07:00
andrew clark	66310cc5bf	Jenkins Alerts Notifications (#3086 ) * Testing minimal pipeline * Update Jenkinsfile * Testing webhook * Testing webhook * Testing webhook * Testing build log output * Testing log retrieval * Testing * Testing pattern matching * Fixing regex * Testing error detection * Testing log formatting Including additional context around log failure. * Testing notification message format * Update Jenkinsfile * Notification formatting * Testing secure interpolation * Testing string interpolation * Notification format * Fixing markdown * Testing markdown * Testing markdown * Revert "Testing markdown" This reverts commit `adeb6d2d55`. * Testing different markdown format * Revert "Testing different markdown format" This reverts commit `bf5406a1cd`. * Testing markdown * Testing markdown * Testing markdown * Testing markdown * Testing markdown * Testing notification * Testing notification * Testing notification * Testing failure mode * Testing failure mode * Adding new patterns and tests * Commenting * Stage name fix * Moving to notification on failure only * Fixing notification format * Testing env vars * Testing build url redirect * Testing no log errors * Testing no errors case * Integrating into primary jenkinsfile * Updating notification message Removed emoji from message [ROCm/composable_kernel commit: `a1ce64374f`]	2025-10-27 08:24:36 -06:00
Thrupti Raj Lakshmana Gowda	20ef4380d7	Ck tile engine preshuffle (#2919 ) * Partial Progress : Preshuffle working code for datatype * Partial Progress : Preshuffle Cleanup * Working code for default config with min max step * Partial Progress : PermuteN implemented in validation * Partial Progress : PermuteN changes in Preshuffle * CK Tile Engine Preshuffle Complete * CK TILE ENGINE : Preshuffle Layout validation * CK Tile Engine Preshuffle Validation * Preshuffle Validation check * CK Tile Engine Preshuffle : Fixing Validation Cases * Addressing PR review Comments * Changes in config * Addressing Review Comments * Adding additional architecture in Jenkins * Partial Progress : Selective Datatype and layouts * Limited datatypes and layouts * Addressing CI errors * Datatype updates * Datatype updates * Datatype changes to Preshuffle * Addressing Review Comments * Addressing Review Comments * Datatype changes * Changes to Cmake * Update on Jenkins * Formatting with precommit * Ruff Formatting [ROCm/composable_kernel commit: `8b185e872e`]	2025-10-27 09:15:34 -05:00
John Shumway	a3261e87a3	[CK Builder] Add missing tf32 type to reflection. (#3090 ) We need to check all the architectures for build errors. This missing tf32 type came up as a build failure when I compiled for different instinct architectures. [ROCm/composable_kernel commit: `6d709dac41`]	2025-10-25 07:28:12 -07:00
Adam Osewski	75a0f41bb0	[CK_Builder] Add name member to unary elementwise ops & update builder traits. (#3093 ) * Add name member to unary elementwise ops. * Update elementwise_op_name to check for name attribute. * Require that the layout is derived from BaseTensorLayout struct. [ROCm/composable_kernel commit: `f53d857b25`]	2025-10-25 07:27:03 -07:00
kabrahamAMD	93a92cf2da	[CK_BUILDER] Add inline string diff for tests (#3067 ) Adds new testing functionality: an inline diff for string comparison. Example usage: EXPECT_THAT("Actual string", ck_tile::test::StringEqWithDiff("Expected string")); Failure message: Value of: "Actual string" Expected: "Expected string" Actual: "Actual string" (of type char [14]), Diff: "[Expe\|A]ct[ed\|ual] string" The inline-diff function uses the Wagner-Fischer algorithm to find the minimum edit distance and generate diff markers, which has O(N^2) complexity. It has optional color codes that are enabled with the matcher. [ROCm/composable_kernel commit: `e576992dca`]	2025-10-25 07:22:41 -07:00
Max Podkorytov	26c4304c84	[CK-Tile][Async gemm] add missing sync and f8 inputs test cases (#3000 ) * add missing sync and f8 test cases * reformat test cases * comment failing cases * bump * reintroduce compv4 shapes [ROCm/composable_kernel commit: `86d542f663`]	2025-10-24 12:16:01 -07:00
Khushbu Agarwal	eef9513fd3	[CK_TILE] Adding support for TiledPermuteN on preshuffle Block Scale Gemm (#3019 ) * Adding support for TiledPermuteN * Adding test * resolving remod.py --------- Co-authored-by: root <root@banff-cyxtera-s73-2.ctr.dcgpu> [ROCm/composable_kernel commit: `0584399571`]	2025-10-24 11:06:51 -07:00
Max Podkorytov	a1681b077e	[CK][host] limit the rotating count to prevent oom (#3089 ) * [CK][host] limit the rotating count to prevent oom * add numeric header for accumulate [ROCm/composable_kernel commit: `f39626fcf7`]	2025-10-24 08:55:54 -07:00
Max Podkorytov	77fc1e4c3f	limit the rotating count to prevent oom (#3087 ) [ROCm/composable_kernel commit: `fdcc1f75c3`]	2025-10-24 08:55:34 -07:00
andrew clark	c47b82b103	Fixing Run CI Check for Changed Files (#3072 ) * Fixing check for changed files * Testing CI skip behavior * Testing CI Trigger This should skip CI --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `775b96ea6a`]	2025-10-24 07:52:43 -07:00
kyle-256	b49f5d9de5	[CK_TILE] add tensorwise quant in grouped gemm (#3007 ) * add tensorwise quant in grouped gemm * fix example issue * update test cases * format codes * clang format * use GTEST_FAIL * fix a bug in test_grouped_gemm_util * skip test when use wmma on grouped_quant kernel * change cmake * change code based on comments --------- Co-authored-by: ThomasNing <thomas.ning@amd.com> [ROCm/composable_kernel commit: `3c12a02827`]	2025-10-24 07:41:54 -07:00
yinglu	480f05ffd9	conv:tf32:add missed instances (#3081 ) * conv:tf32:add missed instances [ROCm/composable_kernel commit: `6bbc05e1bd`]	2025-10-24 16:28:36 +08:00
Robin Voetter	88771b5f47	[CK_BUILDER] old ck build fixes (#3075 ) * Disable c++20-compat warnings when building old CK in C++20 mode Turns out that this creates some warnings for no good reason. * ck-builder: add missing layouts and element-wise op names For layouts, we can directly use the ::name attribute, which should cover all layouts. For element-wise ops, I just added the ones which are currently missing when compiling CK with -DMIOPEN_REQ_LIBS_ONLY. [ROCm/composable_kernel commit: `d0364641ed`]	2025-10-23 13:01:19 -07:00
Thrupti Raj Lakshmana Gowda	9a2f0f82b4	Excluding Tile engine from build (#3085 ) [ROCm/composable_kernel commit: `0fd7d1a607`]	2025-10-23 12:57:18 -07:00
Geo Min	2dc3dad0a0	adding commit hash (#3084 ) [ROCm/composable_kernel commit: `2546fc241e`]	2025-10-23 12:32:26 -07:00
Yi DING	048edb2776	Use filename but not path to filter compilation (#3083 ) * prologue * Use filename but not path to filter test compilation [ROCm/composable_kernel commit: `fe4eaeb2eb`]	2025-10-23 12:01:26 -07:00
Gino Lu	7e4c021e26	[CK_TILE] Add fp4 warp gemm 16x16x128 (#2738 ) * first commit * fix format error * fix vec size error * fix clang format * fix type error * add interface in warp_gemm_impl * fix interface * fix bug * fix bug --------- Co-authored-by: asleepzzz <hanwen.chang@amd.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `bedade2572`]	2025-10-23 10:55:51 -07:00
Rostyslav Geyyer	e6dc79dcc6	Rearrange pointers to fix the reinterpret_cast issue (#3077 ) [ROCm/composable_kernel commit: `6df69abeef`]	2025-10-23 10:54:13 -07:00
Qianfeng	cf31de9211	[CK_TILE] Fix in set_slice_tile (#2232 ) Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> [ROCm/composable_kernel commit: `fbd101b1ac`]	2025-10-23 10:34:02 -07:00
Michal Kulikowski	31939e7b2b	[CK][Examples] Fixing stride issues in ck examples by workaround - Bypassing hostTensor validation. Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com> [ROCm/composable_kernel commit: `b9789a0742`]	2025-10-23 08:46:02 +02:00
Haocong WANG	2a01918313	[CKTILE] FMHA fwd trload lse fix (#3046 ) * enable storelse for fmha_fwd_trload kernel * fix lse in trload * fix the mask related bug [ROCm/composable_kernel commit: `0d3860dfdb`]	2025-10-23 09:33:33 +08:00
spolifroni-amd	93115d8e56	updated the changelog with 7.1 and beyond info [ROCm/composable_kernel commit: `1b95803431`]	2025-10-22 13:35:45 -06:00
lalala-sh	63e0a73bd3	[CK_TILE] Update flatmm related kernels (#3022 ) --------- Co-authored-by: Ding, Yi <yi.ding@amd.com> Co-authored-by: felix <felix.li@amd.com> [ROCm/composable_kernel commit: `211d64e18a`]	2025-10-22 22:36:11 +08:00
Johannes Graner	b8882aae95	[CK_TILE] Conv bwd splitN support (#3047 ) * Conv bwd splitN support * Adjust splitting calculations to lengths format * Prepare indexing for future splitK support [ROCm/composable_kernel commit: `cbd1279ae6`]	2025-10-22 13:34:06 +02:00
MHYangAMD	6d802e7ba4	Introduce tree reduction for BlockReduce2dCrossWarpSync (#2588 ) * Introduce tree reduction for BlockReduce2dCrossWarpSync * Rename original impl to BlockReduce2dLinearCrossWarpSync * Replace warp_size with get_warp_size() --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `5a27a97391`]	2025-10-22 14:41:35 +08:00
John Shumway	8f48205046	[CK_BUILDER] Add compile-time reflection for a convolution instance (#3065 ) * [CK_BILDER] Add compile-time reflection for a convolution instance Introduce InstanceTraits template metaprogramming framework to enable runtime introspection of device kernel template parameters without requiring implementation knowledge. This reflection system extracts configuration details (block sizes, data types, layouts, tuning parameters) directly from kernel specializations through template pattern matching. In particular, the GetInstanceString method returns a string that uniquely idenitfies the kernel, by explicitly serializing all template paramter values. This provides critical functionality for MIOpen integration, since the existing GetTypeString method is ambiguous, and only captures some of the template paramters. The implementation uses a two-level design: a primary InstanceTraits template declaration in instance_traits.hpp serves as the interface, while kernel-specific specializations (e.g., for DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3) provide the actual extraction logic. This separation allows the reflection system to scale to additional kernel types without modifying the core interface. Key architectural decisions: - Forward-declare device kernels in instance_traits.hpp to avoid circular dependencies, since device implementation headers will include the reflection headers - Use compile-time constants and type aliases to expose kernel parameters, enabling zero-overhead introspection - Provide a templated instance_string() function that generates human-readable kernel configuration strings by serializing all template parameters in order, useful for debugging and kernel identification - Guard reflection integration with preprocessor definition CK_EXPERIMENTAL_BUILDER to keep it opt-in until the API stabilizes - Add GetInstanceString() virtual method to BaseOperator, allowing runtime polymorphic access to compile-time kernel information This infrastructure also enables upcoming higher-level semantic reflection abstractions (like ConvTraits) to query kernel configurations programmatically. Includes unit tests validating both the trait extraction accuracy and the string generation format. [ROCm/composable_kernel commit: `37dff024c1`]	2025-10-21 21:10:19 -07:00
Bartłomiej Kocot	4f83a3d745	Gridwise gemm conv v3 force padded layout on gfx950 (#2961 ) * Gridwise gemm conv v3 force padded layout on gfx950 * fix bug in other gridwise * fix * Update gridwise_gemm_wmma_cshuffle_v3_common.hpp [ROCm/composable_kernel commit: `3a28632b20`]	2025-10-21 15:41:02 +02:00
Yashvardhan Agarwal	9072046e55	fix identity value of AbsMax (#3058 ) * fix identity value of AbsMax - Identity value of AbsMax should be 0 not numeric<T>::lowest() * Update include/ck_tile/core/utility/reduce_operator.hpp resolved comment Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com> --------- Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com> [ROCm/composable_kernel commit: `35754d2ec8`]	2025-10-21 14:42:08 +02:00
Johannes Graner	671f2686c0	Fix race conditions in ck_tile remod (#3061 ) [ROCm/composable_kernel commit: `4043401db1`]	2025-10-21 09:35:04 +02:00
Max Podkorytov	7d1d0565d9	refine [ROCm/composable_kernel commit: `ff6efa2fb1`]	2025-10-20 23:13:58 -04:00
Max Podkorytov	6200ea9dfc	update build instructions [ROCm/composable_kernel commit: `b9e966e574`]	2025-10-20 23:13:58 -04:00
Yi DING	698810c92f	[CK_TILE] Add fmt: skip to FMHA codegen scripts for readability (#3057 ) * fmt: skip for fmha_bwd.py * more fmt: skip * thank you, copilot * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> [ROCm/composable_kernel commit: `e20923f384`]	2025-10-21 10:15:04 +08:00
Max Podkorytov	1d7e4157c5	[CK_TILE] Fix transpose_vectors for 2x2 8-bit tiles (#3042 ) fix transpose_vectors logic for 2x2 8-bit tiles add a test which goes through this code path. factor out constexpr'd cases into smaller functions. add inline docs about the data movement impact: gemms with 8-bit non-rcr inputs on gfx942 [ROCm/composable_kernel commit: `2570462ecf`]	2025-10-20 13:40:44 -07:00
Thrupti Raj Lakshmana Gowda	61dbfdb27b	[CK TILE ENGINE] Code changes to finding GPU id from TARGET (#3055 ) * Reading gpuname from target for gemm in ck tile engine * Reading gpuname from target for gemm preshuffle in ck tile engine * Reading gpuname from target for gemm preshuffle in ck tile engine * Get GPU changes for GEMM Muti D in TILE ENGINE * Addressing errors for gpu name in cktileengine [ROCm/composable_kernel commit: `9f77061094`]	2025-10-20 09:02:18 -07:00
John Shumway	5891e2ae79	[CK_BUILDER] Add experimental builder directory and configuration for composable_kernel (#3043 ) Add experimental builder infrastructure for composable_kernel - Add experimental/builder directory with README documentation. - Create initial test infrastructure with CMakeLists.txt and placeholder test. - Update root CMakeLists.txt to support CK_EXPERIMENTAL_BUILDER option. - Update .gitignore to not treat `experimental/builder` as a CMake build directory. This establishes the directory structure for a high-level builder pattern that will provide a semantically-clear interface for constructing CK operations, with initial focus on convolution kernels for MIOpen integration. [ROCm/composable_kernel commit: `f18b79f328`]	2025-10-20 07:54:09 -07:00
Gino Lu	b7e5da5e83	[CK_TILE] Patch for pk_fp4 ref check and buffer load. (#3044 ) * Patch for pk_fp4_raw_t buffer load and ref check [ROCm/composable_kernel commit: `fb1d090f3c`]	2025-10-20 14:47:04 +08:00
BrianHarrisonAMD	0fb13588bb	Add dvc pull step (#3056 ) * Add dvc pull step * Remove CD * Add details about LOGNAME and fail if dvc isn't installed [ROCm/composable_kernel commit: `af3786fe08`]	2025-10-19 16:09:21 -07:00
Illia Silin	525ea9dd3f	disable aiter test gemm_a8w8_blockscale (#3049 ) [ROCm/composable_kernel commit: `d88ea05c84`]	2025-10-17 19:52:22 -07:00
AviralGoelAMD	48b0e60e14	docs: add inline comments about flush_cache and rotating buffer [ROCm/composable_kernel commit: `b03764ca5a`]	2025-10-17 12:56:47 -04:00
Yashvardhan Agarwal	c5eda13381	fix identity values in Max and AbsMax (#3048 ) - The identity value method returned the minimum positive number while we need the lowest number for Max and AbsMax operations [ROCm/composable_kernel commit: `889ffc0b1d`]	2025-10-17 09:49:21 -07:00
Emily Martins	6157673c39	Fix CK Tile Stream-K BF16 Validation Errors (#3039 ) Prior to this change, the number of accumulations passed into calculate_rtol_atol was 1. That said, in most cases, this is not correct when there are multiple workgroups contributing to the same macro tile in C. This change ensures uses the function estimate_num_wgs_per_tile, which was extracted into a common file and generalized, to estimate the number of workgroups per macro tile. This estimate is passed into calculate_rtol_atol to ensure we get a better relative and absolute tolerance. [ROCm/composable_kernel commit: `352dee5225`]	2025-10-17 09:33:38 -07:00
Johannes Graner	f7ffb12123	Pre-commit in CI (#3029 ) * Pre-commit in CI * Specify python version, and install dos2unix for remod * Refactor remod hook to correctly install dependencies * Run pre-commit [ROCm/composable_kernel commit: `8a4cd32d86`]	2025-10-17 09:28:38 -07:00
Ville Pietilä	71ecd257a4	Fixed handling of split-K autodeduce argument for grouped convolution (#3024 ) * Fix handling of split-K autodeduce argument. * Fix clang formatting. * Test fix. * Fix clang formatting. [ROCm/composable_kernel commit: `7e44b845b5`]	2025-10-17 15:36:39 +03:00
Johannes Graner	8af66c65d0	Update pre-commit to fixed versions, run remod for ck_tile (#2895 ) * Fix ruff linter errors * Fix remod dos2unix command * Clang format * Ignore utility in remod * Run remod * Specify clang-format version in pre-commit * Specify ruff version * Include PoolKernelArgs in reference_pool * Add calculate_total_elements to reference batched contraction * Fix calculate_total_elements declaration * Refactor remod pre-commit hook * Fix Aquant tests --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `d40b50b9d5`]	2025-10-16 15:29:17 -07:00
Enrico Degregori	1d9320c8f3	Wave Tile Transfer supporting global load with transpose (#3027 ) * Initial implementation: - add new thread group transfer supporting transpose instruction - refactor AB transfer to switch between thread and wave tiles methods * Add some comments and remove explicit wave and lane calculations * Remove compiler option for performance * fp16 example: use tuned instance * Missing cleanup * Integrate wave transfer in existing gemm and batched gemm instances * Add fast instances * extend implementation for 8 bit datatypes packed types not supported * Address review comments * Optimize pipeline v1 and re-introduce compiler option * Disable wave tile approach for b scale gemm * Fix for clang20 * Avoid code duplication of amd_global_load_transpose_to_vgpr function [ROCm/composable_kernel commit: `440358c168`]	2025-10-16 11:33:56 -07:00
kabrahamAMD	06d76b160e	implement device batched gemm b scale for wmma (#2825 ) * rebased on top of develop * fixed missing shuffeling and wrong indexing * added tests for batched_b_scale * added missing files * fixed wrong stride computation and removed k batching (for now) due to precision issues * reinstated k-batching with PRNG constrained to -1..1 * added specialization of GeneratorTensor_3 for int4 and fixed internal overflow * added k-batching to reference and increased tolerances for test * changed gemm_b_scale and gemm_universal tests to use correct parameters * adressed review commentsd * ported fixes back to non-batched version of b_scale * adressed review comments * run clang-format on older commits * add type-conversion to AccDataType and then to CDataType to exactly mimic GPU's behavior * added newline at end of file * reflected changes from muitl-abd branch in batched b_scale * fixed gfx11 issue * changed range for pki4 to -1...1 (-0.5...0.5 never really made sense for i4 anyway and always should have caused compiler errors, but since there was no int4 specialization of GeneratorTensor3 until now, this passed * run clang format * set range of i4 generation to 0...1 for upstream tests to pass. This replicated previous behavior, which however means that it is NOT properly tested. * reduced range for pk_i4 even further to 0..0 * removed failing xld instances. Failure now uncovered now that tests were fixed * removed generation of int4 values entierly * divide B buffer by BPackedSize --------- Co-authored-by: Kevin Abraham <kevin.abraham@streamhpc.com> [ROCm/composable_kernel commit: `c4b2da9cbd`]	2025-10-16 11:00:42 -07:00

1 2 3 4 5 ...

2525 Commits