Commit Graph

8 Commits

Author SHA1 Message Date
JP-Fernando
9a32f0ea19 [rocm-libraries] ROCm/rocm-libraries#4415 (commit b3b4af7)
[CK] Remove duplicated XDL/WMMA tests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Motivation

When we started the RDNA4 support, the XDL instances were not supporting
WMMA instructions, so we duplicated some tests.

In this issue, we simplified most of the duplicated test files into
common test files.

## Technical Details

The following tests were unified:

- `batched_gemm`

- `batched_gemm_gemm`

- `gemm_add`

- `gemm_universal`

- `grouped_convnd_bwd_data`

The following tests were duplicated exactly, and copied into two files
with `_xdl` and `_wmma` suffixes. Now they are unified in one single
file without suffix:

- `gemm_multi_abd`

- `gemm_b_scale`

There is still an apparent duplication which is a special case, namely
`test_grouped_convnd_bwd_weight_interface_{suffix}` where `{suffix}` is
`xdl` or `wmma`.
However, the WMMA code relies on an old implementation, and is expected
to be removed in the future. In addition, it differs from the XDL
implementation significantly.
Therefore, it was decided to keep both files separate instead of
attempting any unification.

## Test Plan

`CMakeLists.txt` files were modified to support the new, unified tests.
In particular, testing was done for `gfx90a`, `gfx1201` and `gfx11`
architectures.

## Test Result

All tests passed successfully on all three tested architectures.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-02-25 23:23:02 +00:00
Aviral Goel
004784ef98 chore(copyright) update library wide CMakeLists.txt copyright header template (#3313)
* chore(copyright) update library wide CMakeLists.txt files copyright header template

* Fix build

---------

Co-authored-by: Sami Remes <samremes@amd.com>
2025-11-28 13:49:54 -08:00
Eisuke Kawashima
4e586ca958 chore: unset executable permission (#2303)
Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
2025-06-10 09:13:59 -07:00
Anton Gorenko
edd92fc546 DeviceGemm_Wmma_CShuffleV3 with BlockGemmPipelineVersion::v3 (#2096)
* Prepare files for DeviceGemm_Wmma_CShuffleV3

* Implement main part of CShuffleV3 with block pipeline v3 for WMMA

* Remove unused functions and template params for A/B descriptors

* Support both gfx11 and gfx12

* Enable SplitK for gfx12 and disable for gfx11

* Added RowColRow layout for DeviceGemmV2 fp16

* Added more instances for Row, Col, Row data layout

* Added instances for DeviceGemm_Wmma_CShuffleV3, Col, Row, Row data layout

* Added instances for DeviceGemm_Wmma_CShuffleV3, Col, Col, Row data layout

* Added more instances for DeviceGemm_Wmma_CShuffleV3, Row, Row, Row data layout

* Fix formatting

* Add documentation

Based on e5ad48a784

* Enable gemm_universal profiling for gfx11/12

* Add WMMA intrinsics for F8/BF8

* Support F8/BF8 DeviceGemm_Wmma_CShuffleV3, add basic instances

* Add BF16 instances and tests

* Fix test_gemm_universal_wmma_fp8 by adding CK_USE_WMMA_FP8

---------

Co-authored-by: Anca Hamuraru <anca@streamhpc.com>
2025-04-28 10:14:21 +05:00
Muhammed Emin Ozturk
74fda2e796 CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test Redo PR #2044 (#2070)
* fix and split gemm_universal test


* Update test_gemm_universal_streamk_ut_cases_fp8.inc
2025-04-11 10:17:29 -07:00
Illia Silin
29f7266216 Revert "CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test (…" (#2054)
This reverts commit 7142d8003c.
2025-04-07 06:49:36 -07:00
Muhammed Emin Ozturk
7142d8003c CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test (#2044)
* fix and split gemm_universal test

* clang

* Update test_gemm_universal_ut_cases_bf16.inc

* Update test_gemm_universal_xdl_bf16.cpp

* Update test_gemm_universal_ut_cases_fp16.inc
2025-04-03 14:22:43 -07:00
Haocong WANG
f83e9701e9 [GEMM] Gemm universal device operation (#1154)
* Optimize GEMM on MI200/300:
1. Add new blockwise gemm pipeline
2. Add irregular splitk intances

* clang format + typo fix

* Fix a bug

* initial commit

* Add more instances to irregular splitk

* blkgemm pipeline v1~4 prototype

* Sanity Checked. Known issue:
1. Poor performance of splitk
2. Register spill on blkgemmpipeline v3

* Sanity and Performance fix:
1. fix a bug related to sanity in grouped b2c mapping
2. fix a bug related to sanity and performance in splitk offset

* Sanity and API update:
1. Remove prefetch stage
2. Fix valid check bug
3, Add first gemm_universal instance into ckProfiler

* Add NN instances for gemm universal

* 1. Add NT instances for gemm_universal
2. Fix a bug about Kpadding in gemm_universal

* Fix a bug regarding padding Odd K number

* remove kernel print

* Fix KPadding bug...

* Update safety check

* another try to fix kpadding..

* Sanity checked

* new instances..

* clang format+typo fix

* remove clang format script's change

* Add non-hotloop compile option

* 1. Add fp16xfp8 example
2. pull packed convert f8 from pr1150

* Some miscs.. opt and fix

* Add pipeline description docs

* Split universal gemm instance library to cut profiler compiling time

* uncomment cmakefile

* Fix a bug caused by blockwise_gemm_pipe_v2

* reduce default splitk to 1

* Add 224x256x64 tile size

* update, including:
1. Experiment pipeline 5~7
2. Optimization for pipeline 4
3. Organized instance library

* temp save

* temp save

* Permuted lds layout, sanity and function checked

* clang format

* Move OOB check from RunRead to RunWrite, for better software pipeline.
TODO: agpr spill when NN layout

* clangformat

* A/B splitpipe scheduler for v3

* Fix two bugs

* bug fix

* fix a bug in oob check

* Example for mixed fp16_fp8 gemm

* Clean experimental code blocks

* Add mixed precision gemm into profiler

* tempsave

* optimize m/n major lds layout

* Add RRR GEMM  mixed precision instances

* Optimize f8 matrix transpose

* Add test_gemm_universal

* A/B spilt schedule for blkpip v5

* Take ds_read2 into iglp scheduling scheme

* format

* fixed cmake

* Add llvm-option into CI cmake flag

---------

Co-authored-by: Jing Zhang <jizhan@amd.com>
2024-04-13 21:03:18 -05:00