Commit Graph

2606 Commits

Author SHA1 Message Date
Cong Ma
fec8b3228b [CK TILE GEMM] Refactor block_scale_gemm examples (#3181)
* [CK TILE GEMM] Refactor block_scale_gemm examples

- Split cpp file to reduce building time
- Support multiple GemmConfig

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Update Readme

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Add support for rowcol and tensor GEMM operations

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Update README

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Set quant group size to (1, 1, 64) for targets excluding gfx950, where warp tile size (16, 16, 128) is incompatible.

[ROCm/composable_kernel commit: 6fd8ddabe7]
2025-11-12 23:43:40 -08:00
Thrupti Raj Lakshmana Gowda
5c19f34cb4 Ck tile engine commons (#3166)
* Moving Preshuffle to commons

* Fixing Common Validations

* Addressing Review Comments

* Partial Rebasing

* Partial Rebasing

* Partial Rebasing

* Rebasing Complete

[ROCm/composable_kernel commit: 9af30f04b6]
2025-11-13 00:56:18 -06:00
Aviral Goel
4c43e89a84 chore(copyright): update copyright header for test_data directory (#3194)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

[ROCm/composable_kernel commit: 797ddfa41e]
2025-11-12 16:07:28 -08:00
John Afaganis
97dba6f3c5 Add C++17 deprecation warning to CHANGELOG.md (#3203)
* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

[ROCm/composable_kernel commit: 9342365713]
2025-11-12 16:05:53 -08:00
Illia Silin
fbab772ad4 add permissions for /tmp folder (#3201)
[ROCm/composable_kernel commit: 3784c0e7c3]
2025-11-12 11:47:07 -08:00
Enrico Degregori
e00db44d0c Wmma support for gemm_reduce (#3145)
* Initial implementation GEMM+Reduce:

 - device struct
 - epilogue struct

* Fix tests, improve profiler and add initial instances

* Add instances

* Fix compilation error

* Address review comments

* Fix logging

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 7414a0f4d4]
2025-11-12 11:23:54 -08:00
Yashvardhan Agarwal
c8c5a7e1c6 [CK_Tile] Pooling example readme update (#3174)
* pooling example readme update

- The updated readme explains the transformations of the pooling kernel
using a mermaid diagram

* Update example/ck_tile/36_pooling/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* resolve comments

---------

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

[ROCm/composable_kernel commit: 299c9bca1b]
2025-11-12 07:30:20 -08:00
Po Yen Chen
97cb3abf33 [CK_TILE] Share partition index across threads and specify offset in load_tile()/async_load_tile()/load_tile_transpose() (#2905)
* Allow sharing partition index across threads

* Fix typo PartitoinIndex -> PartitionIndex

* Remove C++20 'requires' usages

* Add missing template arguments

* Fix load_tile() overload ambiguity issue

* Use SFINAE to exclude invalid arguments

* Add additional offset parameter to the async_load_tile()

* Remove async_load_tile() default argument to avoid ambiguity

* Extract tile_window coordinate compute logic as method

* Use warp-shared LDS base address in tile_window::async_load()

* Add constraint to tile_window::load() templates

* Fix wrong type traits is_class_v<> usages

* Add missing constraint to async_load_tile()

* Add missing tile_window::load() overload

* Add more constraint to avoid load_tile() call ambiguity

* Rename ParitionIndex as ReplacementPartitionIndex

* Update pre_computed_warp_coords_ in move_extended()

* Fix inconsistency between template parameters and documentation

* Allow specifying pre-computed parition index

* Add type straits is_sequence<> & is_tile_distribution<>

* Add type straits is_tensor_view<>

* Add type constraints to make_tile_window() templates

* Allow passing partition_index to set_tile_if()

* Allow specifying partition_index to store_tile()

* Add missing template parameter of replace_bottom_tensor_view()

* Allow passing partition_index to Default2DEpilogue

* Make get_partition_index() public

* Add _with_offset() postfix to avoid resolution error

* Remove ReplacementPartitionIndex template param

* Add missing comments

* Add load_tile_transpose_with_offset() overload

[ROCm/composable_kernel commit: 40d2ed0f2a]
2025-11-12 10:26:14 +08:00
Bartłomiej Kocot
a2a69e7649 [CK_BUILDER] Add grouped conv fwd ck tile traits (#3183)
* [CK BUILDER] Add grouped conv fwd ck tile traits

* Update instance_traits_tile_grouped_convolution_forward.hpp

* Update grouped_convolution_forward_kernel.hpp


[ROCm/composable_kernel commit: 92c1f4981a]
2025-11-11 13:55:33 -08:00
Aviral Goel
f01853cf46 Add CK Tile Tutorials Folder with GEMM and COPY Kernel (#3038)
* feat: add tutorial folder with gemm tutorial

* chore: move copy kernel from examples folder to tutorial

* Update tutorial/ck_tile/01_naive_gemm/README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tutorial/ck_tile/01_naive_gemm/README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* chore: remove handdrawn images

* docs: add write ups to explain the gemm kernel

* docs: add about block level pipeline and static distributed tensors

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: b145a5fe80]
2025-11-11 14:15:49 -06:00
Aviral Goel
a8d2ecc971 docs: update ckProfiler readme with selective building option (#3140)
* docs: update ckProfiler readme with selective building option

* docs: add list of operations for ckProfiler

[ROCm/composable_kernel commit: c54ecd905b]
2025-11-11 14:27:33 -05:00
Aviral Goel
9ec4b67288 chore(copyright): update copyright header for script directory (#3184)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

---------

Co-authored-by: Vidyasagar Ananthan <vanantha@amd.com>

[ROCm/composable_kernel commit: ab68c9d384]
2025-11-11 11:26:01 -08:00
linqunAMD
13cf0bd17f [CK_TILE] Fix gemm_quant (#3186)
[ROCm/composable_kernel commit: 1b1c46e508]
2025-11-11 08:23:57 -08:00
Aviral Goel
c1b5372db3 chore(copyright): update copyright header for tile_engine directory (#3180)
[ROCm/composable_kernel commit: 88e3212fcc]
2025-11-11 08:17:24 -08:00
Scott Todd
2c7d1aba58 Bump commit ref for TheRock in workflows (#3189)
* Bump commit ref for TheRock in workflows

* Update to more recent commit (could also `rm` the patch)

* Revert "Update to more recent commit (could also `rm` the patch)"

This reverts commit 4b9f4952ea.

* Rm patch that no longer applies

* Fix post_build_upload flag name

* Fix artifact_group plumbing for setup test env

[ROCm/composable_kernel commit: aa1fb29aa1]
2025-11-11 07:44:38 -08:00
Khushbu Agarwal
ae4444dfba formatting (#3182)
[ROCm/composable_kernel commit: 06c651b100]
2025-11-11 07:42:26 -08:00
Enrico Degregori
8e23284922 Extend support for ak1 / bk1 WMMA (#3073)
* Extend AK1 / BK1 support:

 - Add support for AK1 != BK1
 - Add support for AK1, BK1 > 8
 - Introduce KInner template parameter for pipelines when loading multiple tiles with one instruction

* fix clang format

[ROCm/composable_kernel commit: 1c544abf57]
2025-11-11 07:38:15 -08:00
Thomas Ning
b40859d461 fix input range (#3188)
[ROCm/composable_kernel commit: 9f33b7cfd3]
2025-11-10 11:08:41 -08:00
linqunAMD
ddb0078fec [ck] Enable missing op for gfx11 and gfx12 (#3187)
[ROCm/composable_kernel commit: 7b6ba8d5c2]
2025-11-10 10:58:20 -08:00
linqunAMD
93b4c77e06 [ck] correct memory size in grouped_gemm_multi_abd_xdl_fixed_nk_bias_bf16_i8 (#3168)
b1 and b0 use same layout,  so, the size of b1_tensors_device should be same with b0_tensors_device's

[ROCm/composable_kernel commit: e593a14ae1]
2025-11-10 10:58:08 -08:00
Manish Kumar
5f9d5566e5 [CK-Tile] Add gtests for compiler CI for faster testing (#3123)
* Add gtests for compiler CI for faster testing

* Add changes to have a custom target

* Add a gtest suite for gemm kernel for running CI tests with compiler mode

* Fix Clang error (EOL)

* Removed compiler subfolder from CMake

* Add gtest suite for gemm kernel

* Disable failed tests

* Fix build errors

* Resolved PR comments

* Update shape for persistent gemm kernel test

* Seperated types by H/W archs

* Made changes to persistent types

* Fix persistent build failure issue

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: d5746dd120]
2025-11-10 10:42:23 -08:00
Gino Lu
0344170dac fix MX bpreshuffle gemm B grid descriptor dimension error. (#3170)
[ROCm/composable_kernel commit: e31a7a4f29]
2025-11-06 19:42:39 -08:00
Xudong Yuan
6e40562dff Ck moe mxfp4 blockm32 (#3098)
* block_m = 32

* ck block_m = 32

* aiter/3rdparty/composable_kernel/include/ck/tensor_operation/gpu/block/blockwise_gemm_pipeline_xdlops_b_preshuffle_mx_moe_v3.hpp format

* mxfp4_moe v1 pipe

* update format

---------

Co-authored-by: zhimding <zhimding@amd.com>
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com>
Co-authored-by: felix <felix.li@amd.com>

[ROCm/composable_kernel commit: d04eba4ae3]
2025-11-07 08:45:41 +08:00
JH-Leon-KIM-AMD
e8afef1e8b [CK_BUILDER]ckb add remining fwd conv device ops (#3155)
* Add device operation to conv signature. Use unions to hold conv layouts and device operations.

* Add predicates for all device op instances.

* Use the device op signature for validation.

* Fix ckb CMakeLists.txt file for tests.

* Fix building CK Builder instance traits after the introduction of direct load template parameter in CK.

* Fix clang-formatting.

* add device_grouped_conv_fwd_dl_multiple_d_nhwc_kyxc_nhwk

* Add full DL configurability with Option A implementation

- Added 5 DL descriptor structs (39 configurable parameters)
- Added 10 C++20 concepts for type-safe validation
- Updated factory to read all parameters from descriptors
- Updated test helper to populate all descriptors
- All tests passing (13/13 including 3 new DL tests)

* Add factory and test support for DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor

- Add factory specialization for Large_Tensor device operation (conv_factory.hpp lines 1145-1265)
- Add macro collision workaround using pragma push/pop (conv_factory.hpp lines 43-51)
- Add test helper function run_test_DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
- Add builder test file test_ckb_conv_fwd_2d_large_tensor_fp16.cpp with 2 test cases
- Update CMakeLists.txt to include new test file
- Reuse existing ConvAlgorithm_DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle descriptor
- Map all 42 template parameters identical to regular XDL CShuffle
- All 15 builder tests passing including 2 new Large_Tensor tests

Completes Task 350: All 4 forward convolution device operations now supported in CK Builder.

* Update copyright headers to new format

- Change copyright format to: Copyright (C) Advanced Micro Devices, Inc., or its affiliates.
- Reorder headers: Copyright first, then SPDX-License-Identifier
- Updated files:
  * experimental/builder/test/conv/test_ckb_conv_fwd_2d_dl_fp16.cpp
  * experimental/builder/test/conv/test_ckb_conv_fwd_2d_large_tensor_fp16.cpp
  * experimental/builder/include/ck_tile/builder/device_op_types.hpp

* fix c++ 18 format

* Fix clang-format-18 error in device_op_types.hpp

---------

Co-authored-by: Ville Pietilä <ville.pietila@amd.com>
Co-authored-by: Ville Pietilä <188998872+vpietila-amd@users.noreply.github.com>

[ROCm/composable_kernel commit: 5f3cae3e28]
2025-11-06 16:29:48 -08:00
Johannes Graner
085690955f Add .clangd and CMakeUserPresets.json to .gitignore (#3171)
[ROCm/composable_kernel commit: 76c4c12f59]
2025-11-06 15:07:39 -08:00
Adam Osewski
9fde8e559a [CK_BUILDER] Convolution description (#3163)
* Add DirectLoad tparam & clean up headers.

* Add convolution traits.

* Update inline documentation.

* Add more convolution specialization and gemm padding types.

* Add additional helper functions & more tests to conv traits.

* Fix tests cmake file.

* Add case insensitive string comparison

* Fix function name overlapping with variable name.

* Unify pipeline version and scheduler enums.

* Fix includes.

* Update test conv traits with unified enums.

* Update concepts etc with update unified enum

* Fix ckb conv fwd test - unified enum usage.

* Dump changes.

* Add ostream overloads for all enum classes.

* Update detailed() function in ConvDescription

* Fix handling union based conv direction.

* Add test & update conv description.

* Refine tree view.

* Update copyrights

* Fix merge artifacts

* Update detailed tree conv description

* Fix clang-format

[ROCm/composable_kernel commit: 18e083003f]
2025-11-06 15:46:26 +01:00
Bartłomiej Kocot
e89cb52306 [CK TILE] Convolution remove magic values (#3160)
* [CK TILE] Refactor Conv configs and Conv Elementwise

* fix

* [CK TILE] Convolution remove magix values

* fix partitioner

[ROCm/composable_kernel commit: 2234ff830b]
2025-11-06 11:26:30 +01:00
joyeamd
846b43f43b add gfx11's barrier following SPG's reference (#3159)
* add gfx11's barrier following SPG's reference

* re-format the code

* minor fix

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: 12922120d2]
2025-11-05 22:29:03 -08:00
Illia Silin
b7d6555a88 Fix compilation errors with clang22. (#3164)
* resolve compilation issue with clang22

* add __extension__ for __COUNTER__ usage in ck_tile

[ROCm/composable_kernel commit: 4533aa6dba]
2025-11-05 15:42:22 -08:00
Adam Osewski
54409e7fb5 [CK_BUILDER] Convolution traits. (#3152)
Added:

1. Convolution traits & unit tests
2. Update builder enumerators to have representation of Convolution Kernels properties.
3. Unified builder pipeline version & scheduler enumerators

[ROCm/composable_kernel commit: b8527a9236]
2025-11-05 08:53:06 -08:00
andrew clark
a70d21d523 Collecting redis stats (#3149)
[ROCm/composable_kernel commit: 3b076b0b74]
2025-11-04 18:55:11 -08:00
Illia Silin
bb4b6e5961 Initialize new variable to prevent c++17 compiler error (#3156)
* initialize new variable to prevent c++17 compiler error

* build for gfx90a using -std=c++17 flag

[ROCm/composable_kernel commit: 930423ab3b]
2025-11-04 18:54:14 -08:00
Vidyasagar Ananthan
4d72320b51 Chunk Ctests so we dont run into large number of tests error (#3050)
* Chunk Ctests so we dont run into large number of tests error

* Addressing feedback from copilot

[ROCm/composable_kernel commit: 31c019f589]
2025-11-04 10:31:32 -08:00
Cong Ma
0343c4e1fe Introduces the new partitioner to implement the reduction StreamK kernel. (#3107)
* Introduces the new partitioner to implement the reduction StreamK kernel

* Add more doc text to functions

* Add persistent-dp option to streamk example

* Update example/ck_tile/40_streamk_gemm/README.md

[ROCm/composable_kernel commit: 5abe4109e0]
2025-11-04 10:32:17 -07:00
Thomas Ning
1a8f824938 fix the blockscale 2d case (#3148)
Co-authored-by: Aviral Goel <aviral.goel@amd.com>

[ROCm/composable_kernel commit: 13ba06f1e7]
2025-11-04 11:55:23 -05:00
John Shumway
a9d0980ad9 [CK_BUILDER] Update copyright messages. (#3150)
* Update copyright messages.

Copyright messages should no longer include a year. This PR updates all 38 source files to the new format.

* Switch to (C) from unicode copyright symbol.

The unicodein comments  was causing compilation errors.

[ROCm/composable_kernel commit: 0be0288f58]
2025-11-04 15:35:16 +01:00
John Shumway
52204ff4e5 [CK_BUILDER] Add backward weight instance traits for xdl cshuffle. (#3143)
* Add backward weight instance traits for xdl cshuffle.

To keep instance test file sizes reasonable, we start a new test_bwd_weight_instances_traits.cpp test file.

* Fix copyright notices.

* Remove (c) symbol, replace with (C).

Having UTF-8 in source caused an error with code generation.

[ROCm/composable_kernel commit: 6dbee64886]
2025-11-04 15:34:00 +01:00
Bartłomiej Kocot
052c043d99 [CK TILE] Refactor Conv configs and Conv Elementwise (#3151)
* [CK TILE] Refactor Conv configs and Conv Elementwise

* fix

[ROCm/composable_kernel commit: 8681ced962]
2025-11-04 15:04:53 +01:00
Bartłomiej Kocot
a3a55b00d7 [CK TILE] Refactor grouped conv fwd large tensor (#3144)
[ROCm/composable_kernel commit: 99f38e4d9b]
2025-11-04 00:34:48 +01:00
Vidyasagar Ananthan
c9e7b735c0 Adding note on CMake convenience script (#3139)
* Adding note on convenience script

* Addressing feedback

* Update README.md

reword

---------

Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>

[ROCm/composable_kernel commit: c7ded76cc7]
2025-11-03 12:21:57 -08:00
Enrico Degregori
9575bcd099 Fix splitk preshuffle (#3137)
* Fix splitK multiply_multiply_wp

* Add tests for gemm_multiply_multiply_wp

* Add tests for gemm_universal_preshuffle (KBatch = 1)

* Add tests gemm_blockscale_wp

* Fix splitk gemm universal preshuffle

* Run new tests on arch supporting fp8

* Restore example

* Fix strides profiler

* Fix tests

* Fix clang format

* Finalize profiler preshuffle with tolerances

* Minor improvements to splitk related changes

* Address review comments: clang format and ckProfiler typo

* Remove b_k_split_offset from SplitKBatchOffset struct

[ROCm/composable_kernel commit: 507d81c3af]
2025-11-03 11:59:01 -08:00
Thomas Ning
bf0dc8ce56 fix the compv4 and async pipeline when tile handler is 1 (#3141)
[ROCm/composable_kernel commit: 057b7d43b4]
2025-11-03 09:37:35 -08:00
Emily Martins
b00303a831 Replace CK_TILE_PIPELINE macros with a common enum
This change replaces pipeline macros like CK_TILE_PIPELINE_COMPUTE_V3,
CK_TILE_PIPELINE_MEMORY, etc in the CK Tile examples with a common enum
called GemmPipeline to reduce code duplication.


[ROCm/composable_kernel commit: 2ec57a8e70]
2025-11-03 09:35:05 -07:00
Michael Mcminn
699f7daae3 Ud fix moe sorting gfx908 (#2720)
* Adding a ds permute fallback for the gfx908 and older for row_newbcast:7 instruction

* Better macro for selecting ROW_NEWBCAST

* clang-format the update

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: afe1ff618d]
2025-11-03 07:31:31 -08:00
msaffari-amd
7c8d79af33 Ck tile engine gemm unit tests exapand test coverage (#3025)
* initial commit for testing datatypes, layouts and traits

* correct warp tile size for small datatype config to make a validate instance for fp16, bf16, fp8

* add tile size coverage test

* Cover more tests, parallel instance generation, documentation

* update cmakelist to run more tests

* initial codes to support add test params in json file

* add congurable  problem sizes for different tests

* modify README.md

* clean test_gemm_simple code

* correct padding coverage test

* Add comprehensive and quick tile size config files

* remove fp64 from datatypes

* update documents. manage selecting tile_size config (quick or Comprehensive)

* correct padding test problem sizes

* update comprehensive test and correct documents

* Skip GEMM tests with unsupported arguments instead of failing

* change gen_single instead of gen_indivisual because of an issue. add splitk tests to tile_size_quick_config

* clean CMakeList, remod py file

* Refactor test configs: Rename tile_size to coverage, remove separate traits config,  clean cmakefile, readme

* update fp32, fp8 to test all layouts, clean documents and comments

* limit fp32 test layouts to rcr because of compilation error on some gpus

* remove fp32 because of the removing from gemm_instance_builder, make quick test smaller, updating comments

* Fix fp8/bf8 test failures on gfx950 by adding OCP FP8 format support

* Reduce quick_coverage test count from ~250 to ~144 for faster CI

[ROCm/composable_kernel commit: d405641f06]
2025-11-03 10:29:16 +01:00
Ville Pietilä
aeeed60666 [CK_BUILDER] Add conv factories for DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle and DeviceGroupedConvFwdMultipleD_Wmma_CShuffle (#3138)
* Add device operation to conv signature. Use unions to hold conv layouts and device operations.

* Add predicates for all device op instances.

* Use the device op signature for validation.

* Fix ckb CMakeLists.txt file for tests.

* Fix building CK Builder instance traits after the introduction of direct load template parameter in CK.

* Fix clang-formatting.

* Add factory for DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle device op.

* Add conv factory for  DeviceGroupedConvFwdMultipleD_Wmma_CShuffle

* Rename elements per wave per shuffle member in the epilogue concept.

* clang-format

* Add concepts and types for optional device op template parameters.

* Add optional compute, direct load, and loop scheduler arguments to conv factory.

* Add number of groups to merge template parameter.

* clang-format.

[ROCm/composable_kernel commit: 3ae3992c18]
2025-11-03 09:03:25 +02:00
Sami Remes
9f069d6e35 [CK_TILE] B matrix 2D block scale gemm (#3074)
* Refactor quant group size to be configurable for M/N/K, not just K

* add some asserts for configurations not implemented

* start setting of group size for N dimension

* enable 2d for reference quant gemm

* WIP: trying to figure out tile dstr and/or indexing for scale matrix

* WIP

* Fix handling of n dim blocks in tile windows etc

* remove commented code and enable all tests again

* fix formatting

* Add more specialized tile distributions

* Enable NWarps replication for bquant tile dstr

* fix formatting

* fix format

* Fix some issues from the merge

* fix formatting

* one more fix to tile dstr, and revert debug initialization

* Remove commented code

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* simplify conditions that are needed for tile distributions

* only enable the working group sizes in tests

* fix formatting

* Update tile distribution for 2D bquant

* add some documentation and 2d block scale example

* fix formatting

* Add in Changlog and restructure the quant 2d example

* fix CMake

* support the change for blockscale 2d

* fix the test file

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Cong Ma <congma13@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: 16e85cf179]
2025-11-02 16:49:20 -08:00
Aviral Goel
f4b880d058 refactor: remove gemm preshuffle pipeline v1 by removing all references from codebase (#3132)
* test: temporarily disable flaky test_ck_tile_moe_sorting_2d_buf

* refactor: deprecate gemm preshuffle pipeline v1 by removing all references from codebase

* Revert "test: temporarily disable flaky test_ck_tile_moe_sorting_2d_buf"

This reverts commit 573c08a085.

[ROCm/composable_kernel commit: 73f637894d]
2025-11-02 00:06:28 -04:00
Aviral Goel
5be796d8a5 fix: fix bug in print tile window when printing bf8/fp8 tiles (#3120)
* fix: fix bug in print tile window when printing bf8/fp8 tiles

* test(print_tile_window_range): add unit tests to maintain function integrity

* fix: fp8 numerical mismatch error on gfx950 by adding DCK_TILE_USE_OCP_FP8

[ROCm/composable_kernel commit: 45be741586]
2025-11-01 15:28:07 -04:00
Bartłomiej Kocot
b2aa37f3f5 Add 2GB limitation for grouped conv bwd weight (#3054)
[ROCm/composable_kernel commit: ab1a8356b6]
2025-11-01 14:16:45 +01:00