Commit Graph

1808 Commits

Author SHA1 Message Date
Khushbu Agarwal
263ff689e0 New instances for gemm_multiply_multiply_weightpreshuffle operator (#2061)
* Add new instances for weight_preshuffle for f8->bf16

* Add new instances for weight_preshuffle for f8->f16

* clang formatted

---------

Co-authored-by: Khushbu Agarwal <khuagar@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-04-08 15:14:53 -07:00
spolifroni-amd
2c8132126c fixed broken github link (#2063) 2025-04-08 10:20:31 -07:00
dependabot[bot]
b12cd6580b Bump rocm-docs-core from 1.18.1 to 1.18.2 in /docs/sphinx (#2047)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.18.1 to 1.18.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.18.1...v1.18.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-version: 1.18.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-08 09:06:38 -07:00
Max Podkorytov
6ce0797dad simplify generate_tuple (#2043) 2025-04-08 09:00:51 -07:00
aledudek
80aae6119b [CK_TILE] Fix GEMM Memory Pipeline (#2034)
* [CK_TILE] Fix GEMM Memory Pipeline

* Fix transpose tile

* Add comments
2025-04-08 12:40:04 +02:00
Illia Silin
72c0261ef1 Fix a couple of CI issues. (#2050)
* fix jenkins jobs

* fix perf log name for gfx908

* only run gemm perf tests on gfx908
2025-04-07 12:48:34 -07:00
Illia Silin
1793228422 fix codegen issues (#2052) 2025-04-07 07:08:39 -07:00
Illia Silin
29f7266216 Revert "CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test (…" (#2054)
This reverts commit 7142d8003c.
2025-04-07 06:49:36 -07:00
slippedJim
5a22b61de5 Add new receipt (#2055) 2025-04-07 14:18:01 +08:00
Khushbu Agarwal
3bda57c204 file clang formatted (#2053) 2025-04-03 16:55:49 -07:00
Khushbu Agarwal
b443056a26 Documentation for newly added struct (#2051) 2025-04-03 16:24:34 -07:00
Illia Silin
572cd820ce Split env.hpp header from the ck.hpp header. (#2049)
* split env.hpp out of main headers

* fix namespace logic
2025-04-03 15:30:21 -07:00
Muhammed Emin Ozturk
7142d8003c CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test (#2044)
* fix and split gemm_universal test

* clang

* Update test_gemm_universal_ut_cases_bf16.inc

* Update test_gemm_universal_xdl_bf16.cpp

* Update test_gemm_universal_ut_cases_fp16.inc
2025-04-03 14:22:43 -07:00
Khushbu Agarwal
fed0709121 [New] Build up the feature of CK Tile GEMM CodeGen (#1994)
* New branch for codegen changes

* Fix verify function for int4

* pk_int4 codegen

* Update to review comments

* Remove codegen directory and rename filenames

* Remove extra files; clean up CMake file

* New branch for codegen changes

* Fix verify function for int4

* pk_int4 codegen

* Update to review comments

* Remove codegen directory and rename filenames

* Remove extra files; clean up CMake file

* code changes for single instance

* config file rename, added few more combinations in json file

* Fix cmake file

* Addressing review comments

* Reverting files changed by merge to develop

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>
2025-04-03 11:54:12 -07:00
Thomas Ning
50d1f8ff90 Add the MI355 support for CK TILE GEMM (#2046)
* Get the root cause of the ck tile gemm failing on mi355

* Fix the ck tile gemm on MI355

* delete the debug info
2025-04-03 11:48:54 -07:00
Rostyslav Geyyer
265af71a71 Add FP16/BF16<->FP8/BF8 conversions (#2035)
* Move conversion functions and add missing conversions

* Add tests

* Add missing conversions

* Add missing conversions

* Add bf8 tests

* Update clipping for vectors

* Add missing conversions

* Add bf16 fp8 tests

* Add bf16 bf8 tests

* Fix device conversion

* Fix conversions

* Fix vector use

* Minor fix

* Add a workaround flag

* Add a workaround flag for bf16 conversion

* Add another workaround

* Add a workaround for fp16 to bf8 conversion

* Update type alias

* Add docstrings and missing wrappers

* Fix if defined macros

* Fix more if defined macros

* Add comments

* Remove __host__ specifier

* Add a gfx950 guard

* Update function naming
2025-04-03 12:42:03 -05:00
aledudek
9329432f6c Post-merge changes for fully async args copy in ck grouped gemm (#1991)
* Post-merge changes for fully async args copy in ck grouped gemm

* Post-merge documentation and naming changes

* Build fix and updated changelog

* Revised comments
2025-04-03 13:35:43 +02:00
Bartłomiej Kocot
2ccf914888 Add support for GKCYX grouped conv weight (#2023)
* Grouped conv bwd weight GKCYX support

* fix and changelog

* fix

* fix

* fixes

* comments

* fix
2025-04-02 23:59:49 +02:00
Adam Osewski
e5ad48a784 Basic docs for universal gemm & ck-tile gemm. (#2014)
* Basic docs for universal gemm & ck-tile gemm.

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Reviewers suggestions.

* Align tparam names in doc with class tparams.

* More reviewers fine tuning ;)

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
2025-04-02 11:03:40 +02:00
Bartłomiej Kocot
8c0ab61ece Grouped conv backward data GKCYX support (#2029)
* Grouped conv backward data GKCYX support

* profiler

* Converter

* split instances
2025-04-01 13:24:38 -07:00
Bartłomiej Kocot
ec742908bd Grouped conv fwd v3 fix for SplitN an G > 1 (#2038)
* Grouped conv fwd v3 fix for SplitN an G > 1

* Remove int8 large test

* Retore int8 test
2025-04-01 13:19:35 -07:00
Seunghoon Lee
df32020f93 Fix Windows build. (#2012)
* Remove duplicate using uint64_t.

* Cast before shift.
2025-04-01 12:22:10 -07:00
Max Podkorytov
c59a8bb206 add a fast compilation path for static for (0..N) (#2005)
* add a fast compilation path for static for (0..N)

* Update functional2.hpp

add comment and put range applier into detail namespace

* Update functional.hpp

ditto for ck-tile

* prettify

* prettify more

* add comment

* clang-format
2025-04-01 12:06:25 -07:00
Bartłomiej Kocot
6355ee7ca5 Improve compilation time for grouped conv fwd (#2039)
* Improve compilation time for grouped conv fwd

* Fix
2025-04-01 07:11:42 -07:00
Muhammed Emin Ozturk
dd4c12b155 f8/bf16 GEMM Stream-K (#1879) 2025-03-31 20:30:17 -06:00
jefyang1
16b15e336a Fix gemm universal and grouped_conv_fwd test failures on gfx950 (#2031) 2025-03-31 09:20:52 -07:00
Adel Johar
fc073b483e Docs: Add precision support reference page (#1973)
* Docs: Add precision support reference page

* edit of the precision type content

* added more description on scalars

---------

Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com>
Co-authored-by: Aviral Goel <aviral.goel@amd.com>
2025-03-28 08:12:27 -06:00
rocking
8a20b62e91 Reduce redundant space in bias tensor (#2024)
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2025-03-28 21:58:06 +08:00
felix
a82f338fb9 hotfix fix sorting int64 (#2025)
* fix sorting int64

* clang format

* fix example issue

* update WA issue #

---------

Co-authored-by: coderfeli <coderfeli@163.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>
2025-03-28 11:31:52 +08:00
Illia Silin
d142e15f5e add gfx950 to default targets for rocm6.4+ (#2032) 2025-03-27 18:48:47 -07:00
spolifroni-amd
a426f67301 creation of install doc and refactor of doc in general (#1908)
* creation of install doc and refactor of doc in general

* updates based on review comments

* updated based on review comments

* updated readme and contributors markdown

* added extra note to not use -j on its own

* added note about smoke tests and regression tests

* made changes as per Illia's feedback

---------

Co-authored-by: Aviral Goel <aviral.goel@amd.com>
2025-03-27 15:13:18 -06:00
felix
36d50de50e ckmoe: change cmake; use smaller shape for i4 (#2027)
* change cmake; use smaller shape for i4

* fix pki4 run

* fix typo

* fix runtime arch logic for moe_gemm2 example

---------

Co-authored-by: coderfeli <coderfeli@163.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
2025-03-27 09:04:31 -07:00
Rostyslav Geyyer
441343a23d Add MX FP4 device conversion tests (#1889)
* Add conversion tests

* Fix ctor

* Fix nan logic

* Fix conversion logic

* Permute packed f4_t values

* Fix conversion to float, repack vector elements

* Fix device tests

* Permute elements in a vector

* Add a repro test

* Add a conversion for a repro test

* Update test vectors

* Update conversion

* Fix the test

* Update test vector generator

* Fix vector sr conversion

* Permute conversion args

* Update conversion

* Test

* Fix packing

* Simplify conversion function

* Pack conversion in a loop

* Pack conversion in a loop

* Pack another conversion in a loop

* Pack one more conversion in a loop

* Pack the last conversion in a loop

* Clean up

* Add printf to fix intrinsic

* Add a sw-based workaround
2025-03-26 19:23:01 -05:00
Illia Silin
23a949706c Disable all pk_i4 tests for all targets except gfx942/950. (#2022)
* only build gemm_fp8_pk_i4 examples for gfx942/950

* fix cmake logic

* moved the architecture check to IsSupported function

* Revert "moved the architecture check to IsSupported function"

This reverts commit 056d2a08b3.

* disable all pk_i4 tests for targets other than gfx942/950

* fix cmake logic
2025-03-26 15:15:57 -07:00
Bartłomiej Kocot
54c81a1fcf Add support for GKCYX grouped conv fwd (#2015)
* Add support for GKCYX grouped conv fwd

* fixes

* fix

* changelog

* Fixes
2025-03-26 21:13:38 +01:00
Illia Silin
fd915b83f7 fix clang format (#2021) 2025-03-26 09:42:10 -07:00
Mirza Halilčević
21e0ca197d Add default arguments for prologue and epilogue. (#2020) 2025-03-26 09:28:40 -07:00
Illia Silin
99b2bbc1d6 Make sure gemm_fp8_pk_i4 examples only build and run on gfx942/950. (#2010)
* only build gemm_fp8_pk_i4 examples for gfx942/950

* fix cmake logic

* moved the architecture check to IsSupported function

* Revert "moved the architecture check to IsSupported function"

This reverts commit 056d2a08b3.
2025-03-25 14:43:38 -07:00
Andriy Roshchenko
72d888821c MX GEMM examples with FP8, FP16, and E8M0 scales (#2016)
* Add `scalar_type` specification for E8M0 exponent

* Specialize `nnvb_data_t_selector` for E8M0 exponent

* Remove partial specializations for `scalar_type` of `non_native_vector_base` template

* Reword command line helper string

* Create MX GEMM examples for different scales
2025-03-25 15:33:03 -06:00
Illia Silin
44c093ba0c Enable ClangBuildAnalizer when doing ninja build traces. (#2009)
* enable ClangBuildAnalizer when doing ninja traces

* add branch and date to clang build log name

* fix jenkins syntax

* fix jenkins syntax once more

* fix jenkins syntax once more

* simplify the clang_build log name

* simplify the clang_build log name further
2025-03-25 12:27:04 -07:00
Max Podkorytov
1a58522f01 use fast path for sequence generation in old CK (#1993) 2025-03-25 11:28:44 -07:00
ruanjm
d49abdaa87 [CK_TILE] Improve RMS/Layer Normalization 2 Pass Pipeline Performance (#1861)
* 50ms -> 28ms

* Fix bug in non fuse_add_store cases

* Fine tuned setting for 2 pass pipeline

* adjust workload

* remove unnecessary change

* add layernorm

* Adding output quant and unquant results at the same time.

* fix test

* fix format

* tune for cases 128x640 and 128x1024

* bug ifx
2025-03-25 20:09:45 +08:00
Illia Silin
d2eab23958 Split up data_type header. (#1996)
* split fp64 vector data type

* add missing header

* move e8m0 structs

* split off numeric_utils header

* fix typo

* split off numeric limits header

* update data_type header

* fix clang format

* split off vector type header

* fix clang format

* fix typo for binary_inf
2025-03-24 15:08:54 -07:00
Andriy Roshchenko
6660dc6b8e Introduce MX GEMM for FP8 data type (#2000) 2025-03-24 15:41:07 -06:00
MHYang-gh
c027637a8f Fix A/B lds transform (#2007) 2025-03-22 23:13:50 -07:00
Bartłomiej Kocot
5b0873c31a Fix split N for large images in groupd conv fwd (#2004)
* Fix split N for large images in groupd conv fwd

* Fix comments
2025-03-22 23:19:49 +01:00
carlushuang
6c08c5c46d add mask support in hdim=192/128 (#1999) 2025-03-21 18:28:43 +08:00
BingYuan.Zhou
5a0d693b86 fix ck_tile/basic_gemm build error (#1988) 2025-03-20 22:01:14 -07:00
felix
902dbe89ad change cmake (#2006)
Co-authored-by: coderfeli <coderfeli@163.com>
2025-03-20 19:25:11 -07:00
Attila T. Áfra
c79bf11148 Fix compile errors on Windows and Linux (#2002)
* Fix compile error on Windows (call to 'amd_wave_read_first_lane' is ambiguous)

* Fix compile error (no matching function for call to 'cast_to_f32_from_f8')
2025-03-20 12:37:25 -07:00