Commit Graph

6 Commits

Author SHA1 Message Date
Eisuke Kawashima
4e586ca958 chore: unset executable permission (#2303)
Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
2025-06-10 09:13:59 -07:00
Anton Gorenko
edd92fc546 DeviceGemm_Wmma_CShuffleV3 with BlockGemmPipelineVersion::v3 (#2096)
* Prepare files for DeviceGemm_Wmma_CShuffleV3

* Implement main part of CShuffleV3 with block pipeline v3 for WMMA

* Remove unused functions and template params for A/B descriptors

* Support both gfx11 and gfx12

* Enable SplitK for gfx12 and disable for gfx11

* Added RowColRow layout for DeviceGemmV2 fp16

* Added more instances for Row, Col, Row data layout

* Added instances for DeviceGemm_Wmma_CShuffleV3, Col, Row, Row data layout

* Added instances for DeviceGemm_Wmma_CShuffleV3, Col, Col, Row data layout

* Added more instances for DeviceGemm_Wmma_CShuffleV3, Row, Row, Row data layout

* Fix formatting

* Add documentation

Based on e5ad48a784

* Enable gemm_universal profiling for gfx11/12

* Add WMMA intrinsics for F8/BF8

* Support F8/BF8 DeviceGemm_Wmma_CShuffleV3, add basic instances

* Add BF16 instances and tests

* Fix test_gemm_universal_wmma_fp8 by adding CK_USE_WMMA_FP8

---------

Co-authored-by: Anca Hamuraru <anca@streamhpc.com>
2025-04-28 10:14:21 +05:00
Muhammed Emin Ozturk
74fda2e796 CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test Redo PR #2044 (#2070)
* fix and split gemm_universal test


* Update test_gemm_universal_streamk_ut_cases_fp8.inc
2025-04-11 10:17:29 -07:00
Illia Silin
29f7266216 Revert "CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test (…" (#2054)
This reverts commit 7142d8003c.
2025-04-07 06:49:36 -07:00
Muhammed Emin Ozturk
7142d8003c CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test (#2044)
* fix and split gemm_universal test

* clang

* Update test_gemm_universal_ut_cases_bf16.inc

* Update test_gemm_universal_xdl_bf16.cpp

* Update test_gemm_universal_ut_cases_fp16.inc
2025-04-03 14:22:43 -07:00
Haocong WANG
f83e9701e9 [GEMM] Gemm universal device operation (#1154)
* Optimize GEMM on MI200/300:
1. Add new blockwise gemm pipeline
2. Add irregular splitk intances

* clang format + typo fix

* Fix a bug

* initial commit

* Add more instances to irregular splitk

* blkgemm pipeline v1~4 prototype

* Sanity Checked. Known issue:
1. Poor performance of splitk
2. Register spill on blkgemmpipeline v3

* Sanity and Performance fix:
1. fix a bug related to sanity in grouped b2c mapping
2. fix a bug related to sanity and performance in splitk offset

* Sanity and API update:
1. Remove prefetch stage
2. Fix valid check bug
3, Add first gemm_universal instance into ckProfiler

* Add NN instances for gemm universal

* 1. Add NT instances for gemm_universal
2. Fix a bug about Kpadding in gemm_universal

* Fix a bug regarding padding Odd K number

* remove kernel print

* Fix KPadding bug...

* Update safety check

* another try to fix kpadding..

* Sanity checked

* new instances..

* clang format+typo fix

* remove clang format script's change

* Add non-hotloop compile option

* 1. Add fp16xfp8 example
2. pull packed convert f8 from pr1150

* Some miscs.. opt and fix

* Add pipeline description docs

* Split universal gemm instance library to cut profiler compiling time

* uncomment cmakefile

* Fix a bug caused by blockwise_gemm_pipe_v2

* reduce default splitk to 1

* Add 224x256x64 tile size

* update, including:
1. Experiment pipeline 5~7
2. Optimization for pipeline 4
3. Organized instance library

* temp save

* temp save

* Permuted lds layout, sanity and function checked

* clang format

* Move OOB check from RunRead to RunWrite, for better software pipeline.
TODO: agpr spill when NN layout

* clangformat

* A/B splitpipe scheduler for v3

* Fix two bugs

* bug fix

* fix a bug in oob check

* Example for mixed fp16_fp8 gemm

* Clean experimental code blocks

* Add mixed precision gemm into profiler

* tempsave

* optimize m/n major lds layout

* Add RRR GEMM  mixed precision instances

* Optimize f8 matrix transpose

* Add test_gemm_universal

* A/B spilt schedule for blkpip v5

* Take ds_read2 into iglp scheduling scheme

* format

* fixed cmake

* Add llvm-option into CI cmake flag

---------

Co-authored-by: Jing Zhang <jizhan@amd.com>
2024-04-13 21:03:18 -05:00