Commit Graph

1835 Commits

Author SHA1 Message Date
Thomas Ning
0cca8fa28f GEMM Multiply Multiply Fix (#2102)
* fix the type convert and increase the BF16 conversion + the profile comment

* fix the CI
2025-04-22 01:13:22 -07:00
Thomas Ning
4bef60aa57 update code owner (#2113) 2025-04-21 13:53:03 -07:00
Muhammed Emin Ozturk
b092c18da7 MI308 fix for streamk 1-Tile floating point exception (#2101) 2025-04-21 11:44:07 -07:00
Thomas Ning
a738e43445 MFMA 16x16x32fp8 (#2103)
* add mfma_16x16x32_fp8

* clang format code

* Finished the fix for gemm basic

* clang foramt

* rebuild CI

* recover gemm.hpp

* add MFMA 16*16*32bf8

---------

Co-authored-by: solin <bingzhou@amd.com>
2025-04-21 10:21:35 -07:00
Illia Silin
ce61759538 fix daily gfx942 build (#2106) 2025-04-21 08:48:22 -07:00
Khushbu Agarwal
7cadf187e2 multi instance generation for CkTileEngine (#2080)
* Add support for multi-instance verification, print detail for each instance, documentation fix

* clang formatted

* Added Readme file

* updated readme

* Addressing review comments

* clang formatted

* Updated ReadMe and GPU reference code

* simplified dispatch kernel code

* indentation
2025-04-21 08:39:45 -07:00
solin
c318ec0778 fix CI build fail 2025-04-21 16:00:12 +08:00
lalala-sh
bcf5bb41be enable do top k weights in moe stage1 gemm (#2094)
* add switch for mul topk weights

* fix bf16/f16 bugs

* complete
2025-04-18 10:45:49 +08:00
Andriy Roshchenko
213b203a3c MX GEMM - Parameterized Test Template (#2088)
* Tests for MX FP8 GEMM

* Improve documentation
2025-04-16 19:56:00 -06:00
Andriy Roshchenko
da54464cce MX GEMM - Add MX BF8 example (#2071)
* Add MX GEMM example for MX BF8

* Verified MX FP8 with 16x16x128 scale builtin

* Verify MX BF8 GEMM with BF16 output
2025-04-16 15:25:02 -06:00
Illia Silin
3bb62f16cd Upgrade default docker to Ubuntu24.04 (#2090)
* upgrade docker to Ubuntu24.04

* add break-system-packages flag to pip install

* fix dockerfile
2025-04-16 12:10:15 -07:00
aledudek
7c32652e03 Add grouped conv fwd 3d GKCYX instances for f32, f16, bf16 (#2069)
* Part1

* Add grouped conv fwd 3d GKCYX instances for f32, f16, bf16

* Add missing coma

* Add missing cpp instance files

* Fix 3d layout

* Add missing closing bracket

* Add missing comp x2 and part2 instances

* Fix typo in instance name

* fix

* Fix

---------

Co-authored-by: Bartlomiej Kocot <barkocot@amd.com>
2025-04-16 11:00:55 +02:00
BingYuan.Zhou
eaf1f0bf3b [flatmm] implement basic fp16 flatmm (#2089)
* [flatmm] implement basic fp16 flatmm

* fix CI build fail

---------

Co-authored-by: root <root@hjbog-srdc-50.amd.com>
Co-authored-by: solin <bingzhou@amd.com>
2025-04-16 16:51:17 +08:00
felix
c5975529bb add preshuffle gemm fp16 (#2036)
* add preshuffle gemm fp16

* clang format and test ok

* Update gemm_multiply_multiply_xdl_fp16_bpreshuffle.cpp

remove useless comments in example

* Update gemm_multiply_multiply_xdl_fp16_bpreshuffle.cpp

remove 2

---------

Co-authored-by: coderfeli <coderfeli@163.com>
2025-04-16 10:53:21 +08:00
joyeamd
94d47b1680 fmha hdim256 vectorize improve (#2086)
For hdim 256, will not have vectorized buffer load when seqlen % 256 != 0 and hdim % 256 = 0; this commit tries to solve this condition.
2025-04-16 09:21:04 +08:00
Andriy Roshchenko
7106976a72 MX GEMM - New GEMM pipeline for MX data types (#2059)
* Allow selection of mfma_scale instructions

* Read B tensor from LDS to VGPR in chunks of 16 in MFMA order

* Add constexpr and synchronize return type for `get_exponent_value`

* Pass scales by reference and add comments to `mfma_scale_f32_32x32x64`

* Add support for microscaling instructions in `XdlopsGemm`

* Fix `mfma_scale_f32_16x16x128f8f6f4` wrapper

* Remove software implementation of MX GEMM

* Make interface of `intrin_mfma_scale_f32_16x16x128f8f6f4<16, 16>` consistent with the other scale instruction

* Update README

* Updated CHANGELOG

* Remove unused static methods
2025-04-15 17:17:07 -06:00
Illia Silin
d55c9cb313 Upgrade default docker image to ROCm6.4 release. (#2082)
* upgrade to rocm6.4

* fix gfx10 generic target syntax

* use gfx1101 target for unit tests

* use gfx1201 target for unit tests

* do not use generic targets until 6.4.1 release

* update target list and dockerfile.compiler
2025-04-14 16:41:47 -07:00
Mingtao Gu
56378f810f CK pk_i4_t test failures fix (SWDEV-518629) (#2075)
* fix pk_i4_v3 tests failures in Unbuntu env.

* fix pk_i4_t tests failure on Unbuntu issues.

* some fixed.

---------

Co-authored-by: mtgu0705 <mtgu@amd.com>
2025-04-14 16:58:57 +08:00
Thomas Ning
269f4f6af5 Solve the Static Encoding Pattern compile error when the tile size is too small (#2079) 2025-04-13 20:09:30 -07:00
Illia Silin
0d4f145078 Fix build issues for multiple targets. (#2077)
* build for multiple targets on gfx942

* add missing ignore statements
2025-04-11 12:12:53 -07:00
Muhammed Emin Ozturk
74fda2e796 CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test Redo PR #2044 (#2070)
* fix and split gemm_universal test


* Update test_gemm_universal_streamk_ut_cases_fp8.inc
2025-04-11 10:17:29 -07:00
jakpiase
6c61f4d237 [CK_TILE] Add 2:4 structured sparsity support for fp16 gemm (#1957)
* add structured sparsity fp16 support for gemm

* added reviewer suggestions

* update changelog

* update changelog

* add reviewers suggestions

* Minor fix

* clang fix

* fix doxygen
2025-04-11 12:18:26 +02:00
slippedJim
5f885d2b7a add fmha fwd splitkv receipt for aiter c++ api (#2068)
* add s_randval for c++ api

* Fix bug of bias in splitkv

---------

Co-authored-by: rocking <ChunYu.Lai@amd.com>
2025-04-10 23:21:13 +08:00
Juan Manuel Martinez Caamaño
f14e648e7c Replace inline assembly with builtins in FHMA (#2067)
* Replace inline assembly with builtins in FHMA

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
2025-04-10 09:48:37 +02:00
Illia Silin
3e6d21adeb enable gfx115x support (#2065) 2025-04-09 10:06:42 -07:00
MHYang-gh
03ce8729fd Make buffer coherence configurable in tensor view (#2041)
* Make buffer coherence configurable in tensor view

* Fix clang-format for tensor_view.hpp
2025-04-08 15:34:11 -07:00
valarLip
2c563fecf7 add passthrough for int32->float32 (#2062) 2025-04-08 15:16:30 -07:00
Khushbu Agarwal
263ff689e0 New instances for gemm_multiply_multiply_weightpreshuffle operator (#2061)
* Add new instances for weight_preshuffle for f8->bf16

* Add new instances for weight_preshuffle for f8->f16

* clang formatted

---------

Co-authored-by: Khushbu Agarwal <khuagar@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-04-08 15:14:53 -07:00
spolifroni-amd
2c8132126c fixed broken github link (#2063) 2025-04-08 10:20:31 -07:00
dependabot[bot]
b12cd6580b Bump rocm-docs-core from 1.18.1 to 1.18.2 in /docs/sphinx (#2047)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.18.1 to 1.18.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.18.1...v1.18.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-version: 1.18.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-08 09:06:38 -07:00
Max Podkorytov
6ce0797dad simplify generate_tuple (#2043) 2025-04-08 09:00:51 -07:00
aledudek
80aae6119b [CK_TILE] Fix GEMM Memory Pipeline (#2034)
* [CK_TILE] Fix GEMM Memory Pipeline

* Fix transpose tile

* Add comments
2025-04-08 12:40:04 +02:00
Illia Silin
72c0261ef1 Fix a couple of CI issues. (#2050)
* fix jenkins jobs

* fix perf log name for gfx908

* only run gemm perf tests on gfx908
2025-04-07 12:48:34 -07:00
Illia Silin
1793228422 fix codegen issues (#2052) 2025-04-07 07:08:39 -07:00
Illia Silin
29f7266216 Revert "CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test (…" (#2054)
This reverts commit 7142d8003c.
2025-04-07 06:49:36 -07:00
slippedJim
5a22b61de5 Add new receipt (#2055) 2025-04-07 14:18:01 +08:00
Khushbu Agarwal
3bda57c204 file clang formatted (#2053) 2025-04-03 16:55:49 -07:00
Khushbu Agarwal
b443056a26 Documentation for newly added struct (#2051) 2025-04-03 16:24:34 -07:00
Illia Silin
572cd820ce Split env.hpp header from the ck.hpp header. (#2049)
* split env.hpp out of main headers

* fix namespace logic
2025-04-03 15:30:21 -07:00
Muhammed Emin Ozturk
7142d8003c CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test (#2044)
* fix and split gemm_universal test

* clang

* Update test_gemm_universal_ut_cases_bf16.inc

* Update test_gemm_universal_xdl_bf16.cpp

* Update test_gemm_universal_ut_cases_fp16.inc
2025-04-03 14:22:43 -07:00
Khushbu Agarwal
fed0709121 [New] Build up the feature of CK Tile GEMM CodeGen (#1994)
* New branch for codegen changes

* Fix verify function for int4

* pk_int4 codegen

* Update to review comments

* Remove codegen directory and rename filenames

* Remove extra files; clean up CMake file

* New branch for codegen changes

* Fix verify function for int4

* pk_int4 codegen

* Update to review comments

* Remove codegen directory and rename filenames

* Remove extra files; clean up CMake file

* code changes for single instance

* config file rename, added few more combinations in json file

* Fix cmake file

* Addressing review comments

* Reverting files changed by merge to develop

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>
2025-04-03 11:54:12 -07:00
Thomas Ning
50d1f8ff90 Add the MI355 support for CK TILE GEMM (#2046)
* Get the root cause of the ck tile gemm failing on mi355

* Fix the ck tile gemm on MI355

* delete the debug info
2025-04-03 11:48:54 -07:00
Rostyslav Geyyer
265af71a71 Add FP16/BF16<->FP8/BF8 conversions (#2035)
* Move conversion functions and add missing conversions

* Add tests

* Add missing conversions

* Add missing conversions

* Add bf8 tests

* Update clipping for vectors

* Add missing conversions

* Add bf16 fp8 tests

* Add bf16 bf8 tests

* Fix device conversion

* Fix conversions

* Fix vector use

* Minor fix

* Add a workaround flag

* Add a workaround flag for bf16 conversion

* Add another workaround

* Add a workaround for fp16 to bf8 conversion

* Update type alias

* Add docstrings and missing wrappers

* Fix if defined macros

* Fix more if defined macros

* Add comments

* Remove __host__ specifier

* Add a gfx950 guard

* Update function naming
2025-04-03 12:42:03 -05:00
aledudek
9329432f6c Post-merge changes for fully async args copy in ck grouped gemm (#1991)
* Post-merge changes for fully async args copy in ck grouped gemm

* Post-merge documentation and naming changes

* Build fix and updated changelog

* Revised comments
2025-04-03 13:35:43 +02:00
Bartłomiej Kocot
2ccf914888 Add support for GKCYX grouped conv weight (#2023)
* Grouped conv bwd weight GKCYX support

* fix and changelog

* fix

* fix

* fixes

* comments

* fix
2025-04-02 23:59:49 +02:00
Adam Osewski
e5ad48a784 Basic docs for universal gemm & ck-tile gemm. (#2014)
* Basic docs for universal gemm & ck-tile gemm.

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Reviewers suggestions.

* Align tparam names in doc with class tparams.

* More reviewers fine tuning ;)

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
2025-04-02 11:03:40 +02:00
Bartłomiej Kocot
8c0ab61ece Grouped conv backward data GKCYX support (#2029)
* Grouped conv backward data GKCYX support

* profiler

* Converter

* split instances
2025-04-01 13:24:38 -07:00
Bartłomiej Kocot
ec742908bd Grouped conv fwd v3 fix for SplitN an G > 1 (#2038)
* Grouped conv fwd v3 fix for SplitN an G > 1

* Remove int8 large test

* Retore int8 test
2025-04-01 13:19:35 -07:00
Seunghoon Lee
df32020f93 Fix Windows build. (#2012)
* Remove duplicate using uint64_t.

* Cast before shift.
2025-04-01 12:22:10 -07:00
Max Podkorytov
c59a8bb206 add a fast compilation path for static for (0..N) (#2005)
* add a fast compilation path for static for (0..N)

* Update functional2.hpp

add comment and put range applier into detail namespace

* Update functional.hpp

ditto for ck-tile

* prettify

* prettify more

* add comment

* clang-format
2025-04-01 12:06:25 -07:00