Commit Graph

11 Commits

Author SHA1 Message Date
Aviral Goel
4237aedf9a [rocm-libraries] ROCm/rocm-libraries#4335 (commit 06976b3)
=?UTF-8?q?Increase=20tolerance=20for=20FP16=20GEMM=20test?=
 =?UTF-8?q?s=20to=20handle=20non-deterministic=20ro=E2=80=A6=20(#4335)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

…unding

Three tests were failing intermittently with small errors (0.01-1.5%)
due to non-deterministic FP16 accumulation order from GPU thread
scheduling:
- test_ck_tile_batched_gemm
- test_ck_tile_grouped_gemm_preshuffle
- test_ck_tile_grouped_gemm_multi_d

These tests use kbatch=1 (no split-K), so errors are from
order-dependent rounding, not atomics. Increased tolerances from 1e-3 to
2e-3 (0.2%) to account for FP16 precision limits while still catching
real bugs.

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2026-02-07 00:15:34 +00:00
Max Podkorytov
e339101e9c [CK-Tile] move out memory operation from cshuffle epilogue class (#3359)
* initial poc

* factor out common parts in operator()

* cv4

* rest of the universal gemm pipelines

* fix test

* remove boilerplate from tile engine

* fix example

* fix example

* format

* fix tests build for gemm

* remove base pipeline codegen from gemm instance builder

* unify v3 logic with the rest of universal gemm pipelines

* fix build for multi abd test

* fix test gemm multi d

* fix build for weight preshuffle

* fix grouped gemm test

* fix grouped gemm multi d test

* fix grouped gemm preshuffle

* fix grouped gemm example except for quant

* fix gemm preshuffle

* fix splitk 2 stage example

* fix batched gemm example

* fix multid example

* fix multiabd example

* fix batched gemm test

* fixup

* fix examples build

* fix grouped gemm test build

* fix smoke builder

* hacky poc

* fix tile engine

* kill the lambda

* maybe fix test build

* more fixes

* clang-format

* save temp

* clang-format

* mostly fix examples

* clang-format

* remove dead code

* more cleanup

* fix fmha bwd build (default epilogue set/add appears to be broken)

* fix default epilogue tests but not correctness

* clang-format

* fix bquant

* clang-format

* cleanup dead code

* rearrange make windows for readability

* restore changes to IsSupportedArgument

* fix smoke-builder

* clang-format

* fixup rename class

* build fixes

* clang-format

* fix builder

* fixup

* remove set from builder tests

* fix test

* clang-format

* re-refactor the kernels

* clang-format

* fix header license

* remove memory operation from conv bwd test

* clang-format

* clang-format example,include

* clang-format test

* build fixes

* clang-format

* solve compilation error

* fix the CI

* solve compilation error

* clang format

* solve merge conflict

* solve merge conflict

* solve the gfx11 error

* solve test error

* moar build fixes

* remove AtomicAddRequiresKBatchGreaterThanOne test since the property is removed from the kernel scope

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2026-01-04 03:28:14 -08:00
linqunAMD
6d7299ff78 [ck_tile] remove duplicate functions in ck_tile (#3311)
* [ck_tile] remove duplicated shuffle_b and shuffle_b_permuteN

* [ck_tile] move get_k_warp to gemm_shape

* resolve code rebase error
2025-12-15 07:13:00 -08:00
linqunAMD
fc7bf0ab1c [CK_TILE] Port hw independent changes from internal repo to develop branch (#3301)
* [CK_TILE] Port hw independent changes from internal repo to develop branch

It includes PR#96, #114, #120, #121.

* correct rebase error
2025-12-12 09:28:37 -08:00
Max Podkorytov
d184eed823 [CK-Tile] Refactor base pipeline usage (#3251)
* initial poc

* factor out common parts in operator()

* cv4

* rest of the universal gemm pipelines

* fix test

* remove boilerplate from tile engine

* fix example

* fix example

* format

* fix tests build for gemm

* remove base pipeline codegen from gemm instance builder

* unify v3 logic with the rest of universal gemm pipelines

* fix build for multi abd test

* fix test gemm multi d

* fix build for weight preshuffle

* fix grouped gemm test

* fix grouped gemm multi d test

* fix grouped gemm preshuffle

* fix grouped gemm example except for quant

* fix gemm preshuffle

* fix splitk 2 stage example

* fix batched gemm example

* fix multid example

* fix multiabd example

* fix batched gemm test

* fixup

* fix examples build

* fix grouped gemm test build

* fix smoke builder
2025-12-04 11:45:49 -08:00
Aviral Goel
004784ef98 chore(copyright) update library wide CMakeLists.txt copyright header template (#3313)
* chore(copyright) update library wide CMakeLists.txt files copyright header template

* Fix build

---------

Co-authored-by: Sami Remes <samremes@amd.com>
2025-11-28 13:49:54 -08:00
Aviral Goel
c8563f2101 chore(copyright): update copyright header for test directory (#3252)
* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for client_example directory

* chore(copyright): update copyright header for test directory
2025-11-20 20:36:57 -05:00
Aviral Goel
ac70206b2c feat: add support for bf16 for grouped_gemm & grouped_gemm_preshuffle… (#3225)
* feat: add support for bf16 for grouped_gemm & grouped_gemm_preshuffle kernel(s) along with unit test

* docs: Update CHANGELOG.MD
2025-11-18 09:32:27 -05:00
mkumar16-amd
a46b725992 Added Support for tile_grouped_gemm_preshuffle example (#2993)
* Added Support for tile_grouped_gemm_preshuffle example

* Resolved PR comments + Added unit tests for preshuffle with persistent

* Fixed CMake Build config error

* Fix clang error that caused CI to fail

* Fix clang formatting

* Fix clang issue

* Fix errors causing test cases to fail

* Fix grouped_gemm_preshuffle unit test failure

* Resolve PR comments

* Cleaned code + removed unnecassary changes

* Update test/ck_tile/grouped_gemm_preshuffle/test_grouped_gemm_preshuffle_util.hpp

Co-authored-by: Aviral Goel <aviral.goel@amd.com>

* Fix clang formatting

* Made changes to improve code readability

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
Co-authored-by: Aviral Goel <aviral.goel@amd.com>
2025-10-27 11:31:19 -07:00
Aviral Goel
a44bea45b2 Integrate Multi D GEMMs into Grouped GEMMs along with unit tests (#2923)
* feat(grouped_gemm_multi_d): add new example that integrates grouped_gemm and multi_d_gemm feature

* feat: generalized grouped_gemm_kernel.hpp

* feat: generalized grouped_gemm_kernel.hpp even further by removing hardcoded 0

* refactor: grouped_gemm_multi_d relies on grouped_gemm_kernel

* tests(grouped_gemm): grouped_gemm test suite passes with minor adjustments

* fix: segfault fix by passing correct parameters for d tensors

* docs: add multi d info and trim down outdated content

* tests: add unit tests for grouped_gemm_multi_d and minor changes in grouped_gemm related test for compatibility

* style: clang format

* fix: incorrect validation method and Dtensor layout in test suite
2025-09-26 09:59:58 -07:00
Aviral Goel
48e08c6429 test(grouped_gemm): add gtests for the example/grouped_gemm_preshuffle to ensure its integrity (#2811)
* test(grouped_gemm): add gtests for the example to maintain its integrity

* test(grouped_gemm_preshuffle): add prefill variant to testbed to cover wider range

* fix: removed residue code to make b_shuffle() work again

* test(grouped_gemm_preshuffle): limit the test suite to gfx942 arch as it fails on gfx90a

* build: add gfx950 as build target for gtests

* test(grouped_gemm_preshuffle): temporarily disable fp8 prec tests due to numerical errors

* fix(grouped_gemm_preshuffle): resolved fp8 tests failure on gfx950 by adding correct compiler flag
2025-09-16 15:43:30 -07:00