Illia Silin
6d90b2eb50
Split env.hpp header from the ck.hpp header. ( #2049 )
...
* split env.hpp out of main headers
* fix namespace logic
[ROCm/composable_kernel commit: 572cd820ce ]
2025-04-03 15:30:21 -07:00
Muhammed Emin Ozturk
d451d52350
CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test ( #2044 )
...
* fix and split gemm_universal test
* clang
* Update test_gemm_universal_ut_cases_bf16.inc
* Update test_gemm_universal_xdl_bf16.cpp
* Update test_gemm_universal_ut_cases_fp16.inc
[ROCm/composable_kernel commit: 7142d8003c ]
2025-04-03 14:22:43 -07:00
Khushbu Agarwal
b85b103194
[New] Build up the feature of CK Tile GEMM CodeGen ( #1994 )
...
* New branch for codegen changes
* Fix verify function for int4
* pk_int4 codegen
* Update to review comments
* Remove codegen directory and rename filenames
* Remove extra files; clean up CMake file
* New branch for codegen changes
* Fix verify function for int4
* pk_int4 codegen
* Update to review comments
* Remove codegen directory and rename filenames
* Remove extra files; clean up CMake file
* code changes for single instance
* config file rename, added few more combinations in json file
* Fix cmake file
* Addressing review comments
* Reverting files changed by merge to develop
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com >
[ROCm/composable_kernel commit: fed0709121 ]
2025-04-03 11:54:12 -07:00
Thomas Ning
05e13817ba
Add the MI355 support for CK TILE GEMM ( #2046 )
...
* Get the root cause of the ck tile gemm failing on mi355
* Fix the ck tile gemm on MI355
* delete the debug info
[ROCm/composable_kernel commit: 50d1f8ff90 ]
2025-04-03 11:48:54 -07:00
Rostyslav Geyyer
1716380358
Add FP16/BF16<->FP8/BF8 conversions ( #2035 )
...
* Move conversion functions and add missing conversions
* Add tests
* Add missing conversions
* Add missing conversions
* Add bf8 tests
* Update clipping for vectors
* Add missing conversions
* Add bf16 fp8 tests
* Add bf16 bf8 tests
* Fix device conversion
* Fix conversions
* Fix vector use
* Minor fix
* Add a workaround flag
* Add a workaround flag for bf16 conversion
* Add another workaround
* Add a workaround for fp16 to bf8 conversion
* Update type alias
* Add docstrings and missing wrappers
* Fix if defined macros
* Fix more if defined macros
* Add comments
* Remove __host__ specifier
* Add a gfx950 guard
* Update function naming
[ROCm/composable_kernel commit: 265af71a71 ]
2025-04-03 12:42:03 -05:00
aledudek
7a78bc823a
Post-merge changes for fully async args copy in ck grouped gemm ( #1991 )
...
* Post-merge changes for fully async args copy in ck grouped gemm
* Post-merge documentation and naming changes
* Build fix and updated changelog
* Revised comments
[ROCm/composable_kernel commit: 9329432f6c ]
2025-04-03 13:35:43 +02:00
Bartłomiej Kocot
49565538fe
Add support for GKCYX grouped conv weight ( #2023 )
...
* Grouped conv bwd weight GKCYX support
* fix and changelog
* fix
* fix
* fixes
* comments
* fix
[ROCm/composable_kernel commit: 2ccf914888 ]
2025-04-02 23:59:49 +02:00
Adam Osewski
0a607256cf
Basic docs for universal gemm & ck-tile gemm. ( #2014 )
...
* Basic docs for universal gemm & ck-tile gemm.
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Reviewers suggestions.
* Align tparam names in doc with class tparams.
* More reviewers fine tuning ;)
---------
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
[ROCm/composable_kernel commit: e5ad48a784 ]
2025-04-02 11:03:40 +02:00
Bartłomiej Kocot
50fb390d6f
Grouped conv backward data GKCYX support ( #2029 )
...
* Grouped conv backward data GKCYX support
* profiler
* Converter
* split instances
[ROCm/composable_kernel commit: 8c0ab61ece ]
2025-04-01 13:24:38 -07:00
Bartłomiej Kocot
b417137276
Grouped conv fwd v3 fix for SplitN an G > 1 ( #2038 )
...
* Grouped conv fwd v3 fix for SplitN an G > 1
* Remove int8 large test
* Retore int8 test
[ROCm/composable_kernel commit: ec742908bd ]
2025-04-01 13:19:35 -07:00
Seunghoon Lee
0ac4c37028
Fix Windows build. ( #2012 )
...
* Remove duplicate using uint64_t.
* Cast before shift.
[ROCm/composable_kernel commit: df32020f93 ]
2025-04-01 12:22:10 -07:00
Max Podkorytov
70ad8571c2
add a fast compilation path for static for (0..N) ( #2005 )
...
* add a fast compilation path for static for (0..N)
* Update functional2.hpp
add comment and put range applier into detail namespace
* Update functional.hpp
ditto for ck-tile
* prettify
* prettify more
* add comment
* clang-format
[ROCm/composable_kernel commit: c59a8bb206 ]
2025-04-01 12:06:25 -07:00
Bartłomiej Kocot
a9ec282dba
Improve compilation time for grouped conv fwd ( #2039 )
...
* Improve compilation time for grouped conv fwd
* Fix
[ROCm/composable_kernel commit: 6355ee7ca5 ]
2025-04-01 07:11:42 -07:00
Muhammed Emin Ozturk
30e5c8cb49
f8/bf16 GEMM Stream-K ( #1879 )
...
[ROCm/composable_kernel commit: dd4c12b155 ]
2025-03-31 20:30:17 -06:00
jefyang1
92f6e02b96
Fix gemm universal and grouped_conv_fwd test failures on gfx950 ( #2031 )
...
[ROCm/composable_kernel commit: 16b15e336a ]
2025-03-31 09:20:52 -07:00
Adel Johar
de769ac359
Docs: Add precision support reference page ( #1973 )
...
* Docs: Add precision support reference page
* edit of the precision type content
* added more description on scalars
---------
Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com >
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
[ROCm/composable_kernel commit: fc073b483e ]
2025-03-28 08:12:27 -06:00
rocking
2b657d9a2c
Reduce redundant space in bias tensor ( #2024 )
...
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
[ROCm/composable_kernel commit: 8a20b62e91 ]
2025-03-28 21:58:06 +08:00
felix
817752cdb4
hotfix fix sorting int64 ( #2025 )
...
* fix sorting int64
* clang format
* fix example issue
* update WA issue #
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
[ROCm/composable_kernel commit: a82f338fb9 ]
2025-03-28 11:31:52 +08:00
Illia Silin
5bfbcb38ac
add gfx950 to default targets for rocm6.4+ ( #2032 )
...
[ROCm/composable_kernel commit: d142e15f5e ]
2025-03-27 18:48:47 -07:00
spolifroni-amd
98cc377db2
creation of install doc and refactor of doc in general ( #1908 )
...
* creation of install doc and refactor of doc in general
* updates based on review comments
* updated based on review comments
* updated readme and contributors markdown
* added extra note to not use -j on its own
* added note about smoke tests and regression tests
* made changes as per Illia's feedback
---------
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
[ROCm/composable_kernel commit: a426f67301 ]
2025-03-27 15:13:18 -06:00
felix
00342e69a2
ckmoe: change cmake; use smaller shape for i4 ( #2027 )
...
* change cmake; use smaller shape for i4
* fix pki4 run
* fix typo
* fix runtime arch logic for moe_gemm2 example
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: 36d50de50e ]
2025-03-27 09:04:31 -07:00
Rostyslav Geyyer
48fa126a9e
Add MX FP4 device conversion tests ( #1889 )
...
* Add conversion tests
* Fix ctor
* Fix nan logic
* Fix conversion logic
* Permute packed f4_t values
* Fix conversion to float, repack vector elements
* Fix device tests
* Permute elements in a vector
* Add a repro test
* Add a conversion for a repro test
* Update test vectors
* Update conversion
* Fix the test
* Update test vector generator
* Fix vector sr conversion
* Permute conversion args
* Update conversion
* Test
* Fix packing
* Simplify conversion function
* Pack conversion in a loop
* Pack conversion in a loop
* Pack another conversion in a loop
* Pack one more conversion in a loop
* Pack the last conversion in a loop
* Clean up
* Add printf to fix intrinsic
* Add a sw-based workaround
[ROCm/composable_kernel commit: 441343a23d ]
2025-03-26 19:23:01 -05:00
Illia Silin
d424bbe440
Disable all pk_i4 tests for all targets except gfx942/950. ( #2022 )
...
* only build gemm_fp8_pk_i4 examples for gfx942/950
* fix cmake logic
* moved the architecture check to IsSupported function
* Revert "moved the architecture check to IsSupported function"
This reverts commit 056d2a08b3 .
* disable all pk_i4 tests for targets other than gfx942/950
* fix cmake logic
[ROCm/composable_kernel commit: 23a949706c ]
2025-03-26 15:15:57 -07:00
Bartłomiej Kocot
f967fd7296
Add support for GKCYX grouped conv fwd ( #2015 )
...
* Add support for GKCYX grouped conv fwd
* fixes
* fix
* changelog
* Fixes
[ROCm/composable_kernel commit: 54c81a1fcf ]
2025-03-26 21:13:38 +01:00
Illia Silin
ba16351a03
fix clang format ( #2021 )
...
[ROCm/composable_kernel commit: fd915b83f7 ]
2025-03-26 09:42:10 -07:00
Mirza Halilčević
97b32a1f18
Add default arguments for prologue and epilogue. ( #2020 )
...
[ROCm/composable_kernel commit: 21e0ca197d ]
2025-03-26 09:28:40 -07:00
Illia Silin
9a88442eba
Make sure gemm_fp8_pk_i4 examples only build and run on gfx942/950. ( #2010 )
...
* only build gemm_fp8_pk_i4 examples for gfx942/950
* fix cmake logic
* moved the architecture check to IsSupported function
* Revert "moved the architecture check to IsSupported function"
This reverts commit 056d2a08b3 .
[ROCm/composable_kernel commit: 99b2bbc1d6 ]
2025-03-25 14:43:38 -07:00
Andriy Roshchenko
3f06d019ba
MX GEMM examples with FP8, FP16, and E8M0 scales ( #2016 )
...
* Add `scalar_type` specification for E8M0 exponent
* Specialize `nnvb_data_t_selector` for E8M0 exponent
* Remove partial specializations for `scalar_type` of `non_native_vector_base` template
* Reword command line helper string
* Create MX GEMM examples for different scales
[ROCm/composable_kernel commit: 72d888821c ]
2025-03-25 15:33:03 -06:00
Illia Silin
aaf6a0343d
Enable ClangBuildAnalizer when doing ninja build traces. ( #2009 )
...
* enable ClangBuildAnalizer when doing ninja traces
* add branch and date to clang build log name
* fix jenkins syntax
* fix jenkins syntax once more
* fix jenkins syntax once more
* simplify the clang_build log name
* simplify the clang_build log name further
[ROCm/composable_kernel commit: 44c093ba0c ]
2025-03-25 12:27:04 -07:00
Max Podkorytov
a70c642a78
use fast path for sequence generation in old CK ( #1993 )
...
[ROCm/composable_kernel commit: 1a58522f01 ]
2025-03-25 11:28:44 -07:00
ruanjm
fcbf9630fe
[CK_TILE] Improve RMS/Layer Normalization 2 Pass Pipeline Performance ( #1861 )
...
* 50ms -> 28ms
* Fix bug in non fuse_add_store cases
* Fine tuned setting for 2 pass pipeline
* adjust workload
* remove unnecessary change
* add layernorm
* Adding output quant and unquant results at the same time.
* fix test
* fix format
* tune for cases 128x640 and 128x1024
* bug ifx
[ROCm/composable_kernel commit: d49abdaa87 ]
2025-03-25 20:09:45 +08:00
Illia Silin
9dcfd79dcb
Split up data_type header. ( #1996 )
...
* split fp64 vector data type
* add missing header
* move e8m0 structs
* split off numeric_utils header
* fix typo
* split off numeric limits header
* update data_type header
* fix clang format
* split off vector type header
* fix clang format
* fix typo for binary_inf
[ROCm/composable_kernel commit: d2eab23958 ]
2025-03-24 15:08:54 -07:00
Andriy Roshchenko
6b2b228eb4
Introduce MX GEMM for FP8 data type ( #2000 )
...
[ROCm/composable_kernel commit: 6660dc6b8e ]
2025-03-24 15:41:07 -06:00
MHYang-gh
1998fcda26
Fix A/B lds transform ( #2007 )
...
[ROCm/composable_kernel commit: c027637a8f ]
2025-03-22 23:13:50 -07:00
Bartłomiej Kocot
61b5e8318a
Fix split N for large images in groupd conv fwd ( #2004 )
...
* Fix split N for large images in groupd conv fwd
* Fix comments
[ROCm/composable_kernel commit: 5b0873c31a ]
2025-03-22 23:19:49 +01:00
carlushuang
20f331e7fe
add mask support in hdim=192/128 ( #1999 )
...
[ROCm/composable_kernel commit: 6c08c5c46d ]
2025-03-21 18:28:43 +08:00
BingYuan.Zhou
764d9468b2
fix ck_tile/basic_gemm build error ( #1988 )
...
[ROCm/composable_kernel commit: 5a0d693b86 ]
2025-03-20 22:01:14 -07:00
felix
6be18153f8
change cmake ( #2006 )
...
Co-authored-by: coderfeli <coderfeli@163.com >
[ROCm/composable_kernel commit: 902dbe89ad ]
2025-03-20 19:25:11 -07:00
Attila T. Áfra
8cf49a44a9
Fix compile errors on Windows and Linux ( #2002 )
...
* Fix compile error on Windows (call to 'amd_wave_read_first_lane' is ambiguous)
* Fix compile error (no matching function for call to 'cast_to_f32_from_f8')
[ROCm/composable_kernel commit: c79bf11148 ]
2025-03-20 12:37:25 -07:00
carlushuang
f588e4b08e
[CK_TILE] return value with macro in ck_tile::kernel_launch API ( #1982 )
...
* return value with macro and revert the return value
* [CK-TILE] no-macro launch api solution (#1992 )
* no-macro solution
* address -Wcomma
---------
Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com >
[ROCm/composable_kernel commit: e3c9886cdf ]
2025-03-20 11:00:29 -07:00
jakpiase
9bf1c41338
[CK_TILE] Switch to universal gemm for batched and grouped gemms ( #1919 )
...
* switch to universal gemm for batched and grouped gemms
* added reviewer comments
* fixed grouped gemm tests
[ROCm/composable_kernel commit: 0e91d32c61 ]
2025-03-20 11:17:04 +01:00
rocking
8be61cfc9d
Sync the kname with instance name ( #1989 )
...
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
[ROCm/composable_kernel commit: b819c217e4 ]
2025-03-20 00:06:45 +08:00
felix
7e4c8a56ed
Ck moe hot fix ( #1979 )
...
* fix useless code and remove usless oob
* clang format
* fix coredump in e2e test
* fix2
* fix clang format
* fix output oob
* clang format
* rm useless comments
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: 7eaedeb36c ]
2025-03-19 22:58:27 +08:00
Bartłomiej Kocot
71c6106947
Add grouped conv bwd wei merged grouped instance for larger filter ( #1984 )
...
* Add grouped conv bwd wei merged grouped instance for larger filter
* Update readme
[ROCm/composable_kernel commit: fdaff5603e ]
2025-03-18 16:16:24 +01:00
Illia Silin
8df9260b3e
Add a daily CI build on gfx908. ( #1987 )
...
* add one daily ci build on gfx908
* add redis invocation tag for gfx908
* make ci build for gfx908 conditional
* fix groovy logic
* add option to run perf tests for gfx908
* disable a few tests on mi100
[ROCm/composable_kernel commit: 1342ecf7fb ]
2025-03-17 18:08:53 -07:00
Illia Silin
56ad0cdd62
disable ck_tile basic gemm ( #1986 )
...
[ROCm/composable_kernel commit: 07f25186b2 ]
2025-03-17 15:26:43 -07:00
aledudek
94a88e2ecc
Async grouped gemm v3 ( #1940 )
...
* Fully async grouped gemm
* Remove commented code
* Remvoe maybe_unused
* host kernel args
* Checkpoint segfault debugging...
* Working part1
* Working part2
* Remvoe comments...
* Use void ptr for gemm kernel host args
* Fix device_grouped_gemm_multiple_d_dl build issue
* Fix device_grouped_gemm_xdl build issue
[ROCm/composable_kernel commit: 5095906975 ]
2025-03-17 16:42:43 +01:00
Bartłomiej Kocot
36f9cc5fb0
Grouped conv bwd data NGCHW ( #1967 )
...
* Grouped conv bwd data NGCHW
* fixes
* fix
* Improvements
* Fix
* Fix
* add client example
[ROCm/composable_kernel commit: c2e4898b4b ]
2025-03-17 13:32:00 +01:00
valarLip
3d06952a2b
hotfix fmoe build issue ( #1976 )
...
[ROCm/composable_kernel commit: 52b1cd7780 ]
2025-03-13 15:11:59 +08:00
dependabot[bot]
4643738f4d
Bump rocm-docs-core from 1.17.1 to 1.18.1 in /docs/sphinx ( #1977 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.17.1 to 1.18.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.17.1...v1.18.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: de7a745ca6 ]
2025-03-12 23:36:36 -07:00