felix
4c44fa374e
add preshuffle gemm fp16 ( #2036 )
...
* add preshuffle gemm fp16
* clang format and test ok
* Update gemm_multiply_multiply_xdl_fp16_bpreshuffle.cpp
remove useless comments in example
* Update gemm_multiply_multiply_xdl_fp16_bpreshuffle.cpp
remove 2
---------
Co-authored-by: coderfeli <coderfeli@163.com >
[ROCm/composable_kernel commit: c5975529bb ]
2025-04-16 10:53:21 +08:00
joyeamd
19a9980cd5
fmha hdim256 vectorize improve ( #2086 )
...
For hdim 256, will not have vectorized buffer load when seqlen % 256 != 0 and hdim % 256 = 0; this commit tries to solve this condition.
[ROCm/composable_kernel commit: 94d47b1680 ]
2025-04-16 09:21:04 +08:00
Andriy Roshchenko
d9c9f17c3d
MX GEMM - New GEMM pipeline for MX data types ( #2059 )
...
* Allow selection of mfma_scale instructions
* Read B tensor from LDS to VGPR in chunks of 16 in MFMA order
* Add constexpr and synchronize return type for `get_exponent_value`
* Pass scales by reference and add comments to `mfma_scale_f32_32x32x64`
* Add support for microscaling instructions in `XdlopsGemm`
* Fix `mfma_scale_f32_16x16x128f8f6f4` wrapper
* Remove software implementation of MX GEMM
* Make interface of `intrin_mfma_scale_f32_16x16x128f8f6f4<16, 16>` consistent with the other scale instruction
* Update README
* Updated CHANGELOG
* Remove unused static methods
[ROCm/composable_kernel commit: 7106976a72 ]
2025-04-15 17:17:07 -06:00
Illia Silin
49e4ad3278
Upgrade default docker image to ROCm6.4 release. ( #2082 )
...
* upgrade to rocm6.4
* fix gfx10 generic target syntax
* use gfx1101 target for unit tests
* use gfx1201 target for unit tests
* do not use generic targets until 6.4.1 release
* update target list and dockerfile.compiler
[ROCm/composable_kernel commit: d55c9cb313 ]
2025-04-14 16:41:47 -07:00
Mingtao Gu
e8db9f0220
CK pk_i4_t test failures fix (SWDEV-518629) ( #2075 )
...
* fix pk_i4_v3 tests failures in Unbuntu env.
* fix pk_i4_t tests failure on Unbuntu issues.
* some fixed.
---------
Co-authored-by: mtgu0705 <mtgu@amd.com >
[ROCm/composable_kernel commit: 56378f810f ]
2025-04-14 16:58:57 +08:00
Thomas Ning
1b61d3a0ed
Solve the Static Encoding Pattern compile error when the tile size is too small ( #2079 )
...
[ROCm/composable_kernel commit: 269f4f6af5 ]
2025-04-13 20:09:30 -07:00
Illia Silin
90612d0e37
Fix build issues for multiple targets. ( #2077 )
...
* build for multiple targets on gfx942
* add missing ignore statements
[ROCm/composable_kernel commit: 0d4f145078 ]
2025-04-11 12:12:53 -07:00
Muhammed Emin Ozturk
a7a4456753
CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test Redo PR #2044 ( #2070 )
...
* fix and split gemm_universal test
* Update test_gemm_universal_streamk_ut_cases_fp8.inc
[ROCm/composable_kernel commit: 74fda2e796 ]
2025-04-11 10:17:29 -07:00
jakpiase
d76ebf9795
[CK_TILE] Add 2:4 structured sparsity support for fp16 gemm ( #1957 )
...
* add structured sparsity fp16 support for gemm
* added reviewer suggestions
* update changelog
* update changelog
* add reviewers suggestions
* Minor fix
* clang fix
* fix doxygen
[ROCm/composable_kernel commit: 6c61f4d237 ]
2025-04-11 12:18:26 +02:00
slippedJim
959225947a
add fmha fwd splitkv receipt for aiter c++ api ( #2068 )
...
* add s_randval for c++ api
* Fix bug of bias in splitkv
---------
Co-authored-by: rocking <ChunYu.Lai@amd.com >
[ROCm/composable_kernel commit: 5f885d2b7a ]
2025-04-10 23:21:13 +08:00
Juan Manuel Martinez Caamaño
7a42b06988
Replace inline assembly with builtins in FHMA ( #2067 )
...
* Replace inline assembly with builtins in FHMA
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: f14e648e7c ]
2025-04-10 09:48:37 +02:00
Illia Silin
7546e4bafe
enable gfx115x support ( #2065 )
...
[ROCm/composable_kernel commit: 3e6d21adeb ]
2025-04-09 10:06:42 -07:00
MHYang-gh
62ce5b906b
Make buffer coherence configurable in tensor view ( #2041 )
...
* Make buffer coherence configurable in tensor view
* Fix clang-format for tensor_view.hpp
[ROCm/composable_kernel commit: 03ce8729fd ]
2025-04-08 15:34:11 -07:00
valarLip
c1d067be5c
add passthrough for int32->float32 ( #2062 )
...
[ROCm/composable_kernel commit: 2c563fecf7 ]
2025-04-08 15:16:30 -07:00
Khushbu Agarwal
3c0b739f8e
New instances for gemm_multiply_multiply_weightpreshuffle operator ( #2061 )
...
* Add new instances for weight_preshuffle for f8->bf16
* Add new instances for weight_preshuffle for f8->f16
* clang formatted
---------
Co-authored-by: Khushbu Agarwal <khuagar@amd.com >
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
[ROCm/composable_kernel commit: 263ff689e0 ]
2025-04-08 15:14:53 -07:00
spolifroni-amd
fb80a5fdb3
fixed broken github link ( #2063 )
...
[ROCm/composable_kernel commit: 2c8132126c ]
2025-04-08 10:20:31 -07:00
dependabot[bot]
0078f5e0bd
Bump rocm-docs-core from 1.18.1 to 1.18.2 in /docs/sphinx ( #2047 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.18.1 to 1.18.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.18.1...v1.18.2 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-version: 1.18.2
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: b12cd6580b ]
2025-04-08 09:06:38 -07:00
Max Podkorytov
26724086f3
simplify generate_tuple ( #2043 )
...
[ROCm/composable_kernel commit: 6ce0797dad ]
2025-04-08 09:00:51 -07:00
aledudek
6dbaeb5fe8
[CK_TILE] Fix GEMM Memory Pipeline ( #2034 )
...
* [CK_TILE] Fix GEMM Memory Pipeline
* Fix transpose tile
* Add comments
[ROCm/composable_kernel commit: 80aae6119b ]
2025-04-08 12:40:04 +02:00
Illia Silin
cba2a5d45d
Fix a couple of CI issues. ( #2050 )
...
* fix jenkins jobs
* fix perf log name for gfx908
* only run gemm perf tests on gfx908
[ROCm/composable_kernel commit: 72c0261ef1 ]
2025-04-07 12:48:34 -07:00
Illia Silin
32879114dc
fix codegen issues ( #2052 )
...
[ROCm/composable_kernel commit: 1793228422 ]
2025-04-07 07:08:39 -07:00
Illia Silin
22e7aec515
Revert "CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test (…" ( #2054 )
...
This reverts commit 0b4a5b6b99 .
[ROCm/composable_kernel commit: 29f7266216 ]
2025-04-07 06:49:36 -07:00
slippedJim
753a84d9d5
Add new receipt ( #2055 )
...
[ROCm/composable_kernel commit: 5a22b61de5 ]
2025-04-07 14:18:01 +08:00
Khushbu Agarwal
50c53c7252
file clang formatted ( #2053 )
...
[ROCm/composable_kernel commit: 3bda57c204 ]
2025-04-03 16:55:49 -07:00
Khushbu Agarwal
9b9f33d37e
Documentation for newly added struct ( #2051 )
...
[ROCm/composable_kernel commit: b443056a26 ]
2025-04-03 16:24:34 -07:00
Illia Silin
ada1b5f341
Split env.hpp header from the ck.hpp header. ( #2049 )
...
* split env.hpp out of main headers
* fix namespace logic
[ROCm/composable_kernel commit: 572cd820ce ]
2025-04-03 15:30:21 -07:00
Muhammed Emin Ozturk
0b4a5b6b99
CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test ( #2044 )
...
* fix and split gemm_universal test
* clang
* Update test_gemm_universal_ut_cases_bf16.inc
* Update test_gemm_universal_xdl_bf16.cpp
* Update test_gemm_universal_ut_cases_fp16.inc
[ROCm/composable_kernel commit: 7142d8003c ]
2025-04-03 14:22:43 -07:00
Khushbu Agarwal
eee09ecdb3
[New] Build up the feature of CK Tile GEMM CodeGen ( #1994 )
...
* New branch for codegen changes
* Fix verify function for int4
* pk_int4 codegen
* Update to review comments
* Remove codegen directory and rename filenames
* Remove extra files; clean up CMake file
* New branch for codegen changes
* Fix verify function for int4
* pk_int4 codegen
* Update to review comments
* Remove codegen directory and rename filenames
* Remove extra files; clean up CMake file
* code changes for single instance
* config file rename, added few more combinations in json file
* Fix cmake file
* Addressing review comments
* Reverting files changed by merge to develop
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com >
[ROCm/composable_kernel commit: fed0709121 ]
2025-04-03 11:54:12 -07:00
Thomas Ning
03b4c5322d
Add the MI355 support for CK TILE GEMM ( #2046 )
...
* Get the root cause of the ck tile gemm failing on mi355
* Fix the ck tile gemm on MI355
* delete the debug info
[ROCm/composable_kernel commit: 50d1f8ff90 ]
2025-04-03 11:48:54 -07:00
Rostyslav Geyyer
7fbc128e83
Add FP16/BF16<->FP8/BF8 conversions ( #2035 )
...
* Move conversion functions and add missing conversions
* Add tests
* Add missing conversions
* Add missing conversions
* Add bf8 tests
* Update clipping for vectors
* Add missing conversions
* Add bf16 fp8 tests
* Add bf16 bf8 tests
* Fix device conversion
* Fix conversions
* Fix vector use
* Minor fix
* Add a workaround flag
* Add a workaround flag for bf16 conversion
* Add another workaround
* Add a workaround for fp16 to bf8 conversion
* Update type alias
* Add docstrings and missing wrappers
* Fix if defined macros
* Fix more if defined macros
* Add comments
* Remove __host__ specifier
* Add a gfx950 guard
* Update function naming
[ROCm/composable_kernel commit: 265af71a71 ]
2025-04-03 12:42:03 -05:00
aledudek
b7359bcfac
Post-merge changes for fully async args copy in ck grouped gemm ( #1991 )
...
* Post-merge changes for fully async args copy in ck grouped gemm
* Post-merge documentation and naming changes
* Build fix and updated changelog
* Revised comments
[ROCm/composable_kernel commit: 9329432f6c ]
2025-04-03 13:35:43 +02:00
Bartłomiej Kocot
169e3cb4f8
Add support for GKCYX grouped conv weight ( #2023 )
...
* Grouped conv bwd weight GKCYX support
* fix and changelog
* fix
* fix
* fixes
* comments
* fix
[ROCm/composable_kernel commit: 2ccf914888 ]
2025-04-02 23:59:49 +02:00
Adam Osewski
5585c3121e
Basic docs for universal gemm & ck-tile gemm. ( #2014 )
...
* Basic docs for universal gemm & ck-tile gemm.
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Reviewers suggestions.
* Align tparam names in doc with class tparams.
* More reviewers fine tuning ;)
---------
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
[ROCm/composable_kernel commit: e5ad48a784 ]
2025-04-02 11:03:40 +02:00
Bartłomiej Kocot
ca7ae808d4
Grouped conv backward data GKCYX support ( #2029 )
...
* Grouped conv backward data GKCYX support
* profiler
* Converter
* split instances
[ROCm/composable_kernel commit: 8c0ab61ece ]
2025-04-01 13:24:38 -07:00
Bartłomiej Kocot
67c3bcfce1
Grouped conv fwd v3 fix for SplitN an G > 1 ( #2038 )
...
* Grouped conv fwd v3 fix for SplitN an G > 1
* Remove int8 large test
* Retore int8 test
[ROCm/composable_kernel commit: ec742908bd ]
2025-04-01 13:19:35 -07:00
Seunghoon Lee
345ab65612
Fix Windows build. ( #2012 )
...
* Remove duplicate using uint64_t.
* Cast before shift.
[ROCm/composable_kernel commit: df32020f93 ]
2025-04-01 12:22:10 -07:00
Max Podkorytov
cf08db04a6
add a fast compilation path for static for (0..N) ( #2005 )
...
* add a fast compilation path for static for (0..N)
* Update functional2.hpp
add comment and put range applier into detail namespace
* Update functional.hpp
ditto for ck-tile
* prettify
* prettify more
* add comment
* clang-format
[ROCm/composable_kernel commit: c59a8bb206 ]
2025-04-01 12:06:25 -07:00
Bartłomiej Kocot
c7e63f89d6
Improve compilation time for grouped conv fwd ( #2039 )
...
* Improve compilation time for grouped conv fwd
* Fix
[ROCm/composable_kernel commit: 6355ee7ca5 ]
2025-04-01 07:11:42 -07:00
Muhammed Emin Ozturk
532127f25d
f8/bf16 GEMM Stream-K ( #1879 )
...
[ROCm/composable_kernel commit: dd4c12b155 ]
2025-03-31 20:30:17 -06:00
jefyang1
0d3e64941f
Fix gemm universal and grouped_conv_fwd test failures on gfx950 ( #2031 )
...
[ROCm/composable_kernel commit: 16b15e336a ]
2025-03-31 09:20:52 -07:00
Adel Johar
f495cab23b
Docs: Add precision support reference page ( #1973 )
...
* Docs: Add precision support reference page
* edit of the precision type content
* added more description on scalars
---------
Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com >
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
[ROCm/composable_kernel commit: fc073b483e ]
2025-03-28 08:12:27 -06:00
rocking
01ea8aa249
Reduce redundant space in bias tensor ( #2024 )
...
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
[ROCm/composable_kernel commit: 8a20b62e91 ]
2025-03-28 21:58:06 +08:00
felix
20ffa0f474
hotfix fix sorting int64 ( #2025 )
...
* fix sorting int64
* clang format
* fix example issue
* update WA issue #
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
[ROCm/composable_kernel commit: a82f338fb9 ]
2025-03-28 11:31:52 +08:00
Illia Silin
895ba2b497
add gfx950 to default targets for rocm6.4+ ( #2032 )
...
[ROCm/composable_kernel commit: d142e15f5e ]
2025-03-27 18:48:47 -07:00
spolifroni-amd
408c8b8125
creation of install doc and refactor of doc in general ( #1908 )
...
* creation of install doc and refactor of doc in general
* updates based on review comments
* updated based on review comments
* updated readme and contributors markdown
* added extra note to not use -j on its own
* added note about smoke tests and regression tests
* made changes as per Illia's feedback
---------
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
[ROCm/composable_kernel commit: a426f67301 ]
2025-03-27 15:13:18 -06:00
felix
900acdc2db
ckmoe: change cmake; use smaller shape for i4 ( #2027 )
...
* change cmake; use smaller shape for i4
* fix pki4 run
* fix typo
* fix runtime arch logic for moe_gemm2 example
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: 36d50de50e ]
2025-03-27 09:04:31 -07:00
Rostyslav Geyyer
23ad59e1fd
Add MX FP4 device conversion tests ( #1889 )
...
* Add conversion tests
* Fix ctor
* Fix nan logic
* Fix conversion logic
* Permute packed f4_t values
* Fix conversion to float, repack vector elements
* Fix device tests
* Permute elements in a vector
* Add a repro test
* Add a conversion for a repro test
* Update test vectors
* Update conversion
* Fix the test
* Update test vector generator
* Fix vector sr conversion
* Permute conversion args
* Update conversion
* Test
* Fix packing
* Simplify conversion function
* Pack conversion in a loop
* Pack conversion in a loop
* Pack another conversion in a loop
* Pack one more conversion in a loop
* Pack the last conversion in a loop
* Clean up
* Add printf to fix intrinsic
* Add a sw-based workaround
[ROCm/composable_kernel commit: 441343a23d ]
2025-03-26 19:23:01 -05:00
Illia Silin
73a5a3c463
Disable all pk_i4 tests for all targets except gfx942/950. ( #2022 )
...
* only build gemm_fp8_pk_i4 examples for gfx942/950
* fix cmake logic
* moved the architecture check to IsSupported function
* Revert "moved the architecture check to IsSupported function"
This reverts commit 056d2a08b3 .
* disable all pk_i4 tests for targets other than gfx942/950
* fix cmake logic
[ROCm/composable_kernel commit: 23a949706c ]
2025-03-26 15:15:57 -07:00
Bartłomiej Kocot
6ccfb817e4
Add support for GKCYX grouped conv fwd ( #2015 )
...
* Add support for GKCYX grouped conv fwd
* fixes
* fix
* changelog
* Fixes
[ROCm/composable_kernel commit: 54c81a1fcf ]
2025-03-26 21:13:38 +01:00
Illia Silin
fd8983d063
fix clang format ( #2021 )
...
[ROCm/composable_kernel commit: fd915b83f7 ]
2025-03-26 09:42:10 -07:00