aledudek
6dbaeb5fe8
[CK_TILE] Fix GEMM Memory Pipeline ( #2034 )
...
* [CK_TILE] Fix GEMM Memory Pipeline
* Fix transpose tile
* Add comments
[ROCm/composable_kernel commit: 80aae6119b ]
2025-04-08 12:40:04 +02:00
Illia Silin
cba2a5d45d
Fix a couple of CI issues. ( #2050 )
...
* fix jenkins jobs
* fix perf log name for gfx908
* only run gemm perf tests on gfx908
[ROCm/composable_kernel commit: 72c0261ef1 ]
2025-04-07 12:48:34 -07:00
Illia Silin
32879114dc
fix codegen issues ( #2052 )
...
[ROCm/composable_kernel commit: 1793228422 ]
2025-04-07 07:08:39 -07:00
Illia Silin
22e7aec515
Revert "CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test (…" ( #2054 )
...
This reverts commit 0b4a5b6b99 .
[ROCm/composable_kernel commit: 29f7266216 ]
2025-04-07 06:49:36 -07:00
slippedJim
753a84d9d5
Add new receipt ( #2055 )
...
[ROCm/composable_kernel commit: 5a22b61de5 ]
2025-04-07 14:18:01 +08:00
Khushbu Agarwal
50c53c7252
file clang formatted ( #2053 )
...
[ROCm/composable_kernel commit: 3bda57c204 ]
2025-04-03 16:55:49 -07:00
Khushbu Agarwal
9b9f33d37e
Documentation for newly added struct ( #2051 )
...
[ROCm/composable_kernel commit: b443056a26 ]
2025-04-03 16:24:34 -07:00
Illia Silin
ada1b5f341
Split env.hpp header from the ck.hpp header. ( #2049 )
...
* split env.hpp out of main headers
* fix namespace logic
[ROCm/composable_kernel commit: 572cd820ce ]
2025-04-03 15:30:21 -07:00
Muhammed Emin Ozturk
0b4a5b6b99
CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test ( #2044 )
...
* fix and split gemm_universal test
* clang
* Update test_gemm_universal_ut_cases_bf16.inc
* Update test_gemm_universal_xdl_bf16.cpp
* Update test_gemm_universal_ut_cases_fp16.inc
[ROCm/composable_kernel commit: 7142d8003c ]
2025-04-03 14:22:43 -07:00
Khushbu Agarwal
eee09ecdb3
[New] Build up the feature of CK Tile GEMM CodeGen ( #1994 )
...
* New branch for codegen changes
* Fix verify function for int4
* pk_int4 codegen
* Update to review comments
* Remove codegen directory and rename filenames
* Remove extra files; clean up CMake file
* New branch for codegen changes
* Fix verify function for int4
* pk_int4 codegen
* Update to review comments
* Remove codegen directory and rename filenames
* Remove extra files; clean up CMake file
* code changes for single instance
* config file rename, added few more combinations in json file
* Fix cmake file
* Addressing review comments
* Reverting files changed by merge to develop
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com >
[ROCm/composable_kernel commit: fed0709121 ]
2025-04-03 11:54:12 -07:00
Thomas Ning
03b4c5322d
Add the MI355 support for CK TILE GEMM ( #2046 )
...
* Get the root cause of the ck tile gemm failing on mi355
* Fix the ck tile gemm on MI355
* delete the debug info
[ROCm/composable_kernel commit: 50d1f8ff90 ]
2025-04-03 11:48:54 -07:00
Rostyslav Geyyer
7fbc128e83
Add FP16/BF16<->FP8/BF8 conversions ( #2035 )
...
* Move conversion functions and add missing conversions
* Add tests
* Add missing conversions
* Add missing conversions
* Add bf8 tests
* Update clipping for vectors
* Add missing conversions
* Add bf16 fp8 tests
* Add bf16 bf8 tests
* Fix device conversion
* Fix conversions
* Fix vector use
* Minor fix
* Add a workaround flag
* Add a workaround flag for bf16 conversion
* Add another workaround
* Add a workaround for fp16 to bf8 conversion
* Update type alias
* Add docstrings and missing wrappers
* Fix if defined macros
* Fix more if defined macros
* Add comments
* Remove __host__ specifier
* Add a gfx950 guard
* Update function naming
[ROCm/composable_kernel commit: 265af71a71 ]
2025-04-03 12:42:03 -05:00
aledudek
b7359bcfac
Post-merge changes for fully async args copy in ck grouped gemm ( #1991 )
...
* Post-merge changes for fully async args copy in ck grouped gemm
* Post-merge documentation and naming changes
* Build fix and updated changelog
* Revised comments
[ROCm/composable_kernel commit: 9329432f6c ]
2025-04-03 13:35:43 +02:00
Bartłomiej Kocot
169e3cb4f8
Add support for GKCYX grouped conv weight ( #2023 )
...
* Grouped conv bwd weight GKCYX support
* fix and changelog
* fix
* fix
* fixes
* comments
* fix
[ROCm/composable_kernel commit: 2ccf914888 ]
2025-04-02 23:59:49 +02:00
Adam Osewski
5585c3121e
Basic docs for universal gemm & ck-tile gemm. ( #2014 )
...
* Basic docs for universal gemm & ck-tile gemm.
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Reviewers suggestions.
* Align tparam names in doc with class tparams.
* More reviewers fine tuning ;)
---------
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
[ROCm/composable_kernel commit: e5ad48a784 ]
2025-04-02 11:03:40 +02:00
Bartłomiej Kocot
ca7ae808d4
Grouped conv backward data GKCYX support ( #2029 )
...
* Grouped conv backward data GKCYX support
* profiler
* Converter
* split instances
[ROCm/composable_kernel commit: 8c0ab61ece ]
2025-04-01 13:24:38 -07:00
Bartłomiej Kocot
67c3bcfce1
Grouped conv fwd v3 fix for SplitN an G > 1 ( #2038 )
...
* Grouped conv fwd v3 fix for SplitN an G > 1
* Remove int8 large test
* Retore int8 test
[ROCm/composable_kernel commit: ec742908bd ]
2025-04-01 13:19:35 -07:00
Seunghoon Lee
345ab65612
Fix Windows build. ( #2012 )
...
* Remove duplicate using uint64_t.
* Cast before shift.
[ROCm/composable_kernel commit: df32020f93 ]
2025-04-01 12:22:10 -07:00
Max Podkorytov
cf08db04a6
add a fast compilation path for static for (0..N) ( #2005 )
...
* add a fast compilation path for static for (0..N)
* Update functional2.hpp
add comment and put range applier into detail namespace
* Update functional.hpp
ditto for ck-tile
* prettify
* prettify more
* add comment
* clang-format
[ROCm/composable_kernel commit: c59a8bb206 ]
2025-04-01 12:06:25 -07:00
Bartłomiej Kocot
c7e63f89d6
Improve compilation time for grouped conv fwd ( #2039 )
...
* Improve compilation time for grouped conv fwd
* Fix
[ROCm/composable_kernel commit: 6355ee7ca5 ]
2025-04-01 07:11:42 -07:00
Muhammed Emin Ozturk
532127f25d
f8/bf16 GEMM Stream-K ( #1879 )
...
[ROCm/composable_kernel commit: dd4c12b155 ]
2025-03-31 20:30:17 -06:00
jefyang1
0d3e64941f
Fix gemm universal and grouped_conv_fwd test failures on gfx950 ( #2031 )
...
[ROCm/composable_kernel commit: 16b15e336a ]
2025-03-31 09:20:52 -07:00
Adel Johar
f495cab23b
Docs: Add precision support reference page ( #1973 )
...
* Docs: Add precision support reference page
* edit of the precision type content
* added more description on scalars
---------
Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com >
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
[ROCm/composable_kernel commit: fc073b483e ]
2025-03-28 08:12:27 -06:00
rocking
01ea8aa249
Reduce redundant space in bias tensor ( #2024 )
...
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
[ROCm/composable_kernel commit: 8a20b62e91 ]
2025-03-28 21:58:06 +08:00
felix
20ffa0f474
hotfix fix sorting int64 ( #2025 )
...
* fix sorting int64
* clang format
* fix example issue
* update WA issue #
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
[ROCm/composable_kernel commit: a82f338fb9 ]
2025-03-28 11:31:52 +08:00
Illia Silin
895ba2b497
add gfx950 to default targets for rocm6.4+ ( #2032 )
...
[ROCm/composable_kernel commit: d142e15f5e ]
2025-03-27 18:48:47 -07:00
spolifroni-amd
408c8b8125
creation of install doc and refactor of doc in general ( #1908 )
...
* creation of install doc and refactor of doc in general
* updates based on review comments
* updated based on review comments
* updated readme and contributors markdown
* added extra note to not use -j on its own
* added note about smoke tests and regression tests
* made changes as per Illia's feedback
---------
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
[ROCm/composable_kernel commit: a426f67301 ]
2025-03-27 15:13:18 -06:00
felix
900acdc2db
ckmoe: change cmake; use smaller shape for i4 ( #2027 )
...
* change cmake; use smaller shape for i4
* fix pki4 run
* fix typo
* fix runtime arch logic for moe_gemm2 example
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: 36d50de50e ]
2025-03-27 09:04:31 -07:00
Rostyslav Geyyer
23ad59e1fd
Add MX FP4 device conversion tests ( #1889 )
...
* Add conversion tests
* Fix ctor
* Fix nan logic
* Fix conversion logic
* Permute packed f4_t values
* Fix conversion to float, repack vector elements
* Fix device tests
* Permute elements in a vector
* Add a repro test
* Add a conversion for a repro test
* Update test vectors
* Update conversion
* Fix the test
* Update test vector generator
* Fix vector sr conversion
* Permute conversion args
* Update conversion
* Test
* Fix packing
* Simplify conversion function
* Pack conversion in a loop
* Pack conversion in a loop
* Pack another conversion in a loop
* Pack one more conversion in a loop
* Pack the last conversion in a loop
* Clean up
* Add printf to fix intrinsic
* Add a sw-based workaround
[ROCm/composable_kernel commit: 441343a23d ]
2025-03-26 19:23:01 -05:00
Illia Silin
73a5a3c463
Disable all pk_i4 tests for all targets except gfx942/950. ( #2022 )
...
* only build gemm_fp8_pk_i4 examples for gfx942/950
* fix cmake logic
* moved the architecture check to IsSupported function
* Revert "moved the architecture check to IsSupported function"
This reverts commit 056d2a08b3 .
* disable all pk_i4 tests for targets other than gfx942/950
* fix cmake logic
[ROCm/composable_kernel commit: 23a949706c ]
2025-03-26 15:15:57 -07:00
Bartłomiej Kocot
6ccfb817e4
Add support for GKCYX grouped conv fwd ( #2015 )
...
* Add support for GKCYX grouped conv fwd
* fixes
* fix
* changelog
* Fixes
[ROCm/composable_kernel commit: 54c81a1fcf ]
2025-03-26 21:13:38 +01:00
Illia Silin
fd8983d063
fix clang format ( #2021 )
...
[ROCm/composable_kernel commit: fd915b83f7 ]
2025-03-26 09:42:10 -07:00
Mirza Halilčević
614e3fee5e
Add default arguments for prologue and epilogue. ( #2020 )
...
[ROCm/composable_kernel commit: 21e0ca197d ]
2025-03-26 09:28:40 -07:00
Illia Silin
27de1d431a
Make sure gemm_fp8_pk_i4 examples only build and run on gfx942/950. ( #2010 )
...
* only build gemm_fp8_pk_i4 examples for gfx942/950
* fix cmake logic
* moved the architecture check to IsSupported function
* Revert "moved the architecture check to IsSupported function"
This reverts commit 056d2a08b3 .
[ROCm/composable_kernel commit: 99b2bbc1d6 ]
2025-03-25 14:43:38 -07:00
Andriy Roshchenko
75ef4c83bf
MX GEMM examples with FP8, FP16, and E8M0 scales ( #2016 )
...
* Add `scalar_type` specification for E8M0 exponent
* Specialize `nnvb_data_t_selector` for E8M0 exponent
* Remove partial specializations for `scalar_type` of `non_native_vector_base` template
* Reword command line helper string
* Create MX GEMM examples for different scales
[ROCm/composable_kernel commit: 72d888821c ]
2025-03-25 15:33:03 -06:00
Illia Silin
21af4139ad
Enable ClangBuildAnalizer when doing ninja build traces. ( #2009 )
...
* enable ClangBuildAnalizer when doing ninja traces
* add branch and date to clang build log name
* fix jenkins syntax
* fix jenkins syntax once more
* fix jenkins syntax once more
* simplify the clang_build log name
* simplify the clang_build log name further
[ROCm/composable_kernel commit: 44c093ba0c ]
2025-03-25 12:27:04 -07:00
Max Podkorytov
58789d03d3
use fast path for sequence generation in old CK ( #1993 )
...
[ROCm/composable_kernel commit: 1a58522f01 ]
2025-03-25 11:28:44 -07:00
ruanjm
ce1d20c2c6
[CK_TILE] Improve RMS/Layer Normalization 2 Pass Pipeline Performance ( #1861 )
...
* 50ms -> 28ms
* Fix bug in non fuse_add_store cases
* Fine tuned setting for 2 pass pipeline
* adjust workload
* remove unnecessary change
* add layernorm
* Adding output quant and unquant results at the same time.
* fix test
* fix format
* tune for cases 128x640 and 128x1024
* bug ifx
[ROCm/composable_kernel commit: d49abdaa87 ]
2025-03-25 20:09:45 +08:00
Illia Silin
b9e0e7d93e
Split up data_type header. ( #1996 )
...
* split fp64 vector data type
* add missing header
* move e8m0 structs
* split off numeric_utils header
* fix typo
* split off numeric limits header
* update data_type header
* fix clang format
* split off vector type header
* fix clang format
* fix typo for binary_inf
[ROCm/composable_kernel commit: d2eab23958 ]
2025-03-24 15:08:54 -07:00
Andriy Roshchenko
bbdd7f6d57
Introduce MX GEMM for FP8 data type ( #2000 )
...
[ROCm/composable_kernel commit: 6660dc6b8e ]
2025-03-24 15:41:07 -06:00
MHYang-gh
fd151c05d9
Fix A/B lds transform ( #2007 )
...
[ROCm/composable_kernel commit: c027637a8f ]
2025-03-22 23:13:50 -07:00
Bartłomiej Kocot
ceb078163f
Fix split N for large images in groupd conv fwd ( #2004 )
...
* Fix split N for large images in groupd conv fwd
* Fix comments
[ROCm/composable_kernel commit: 5b0873c31a ]
2025-03-22 23:19:49 +01:00
carlushuang
e1122c5c27
add mask support in hdim=192/128 ( #1999 )
...
[ROCm/composable_kernel commit: 6c08c5c46d ]
2025-03-21 18:28:43 +08:00
BingYuan.Zhou
c245d569d5
fix ck_tile/basic_gemm build error ( #1988 )
...
[ROCm/composable_kernel commit: 5a0d693b86 ]
2025-03-20 22:01:14 -07:00
felix
bd00da1848
change cmake ( #2006 )
...
Co-authored-by: coderfeli <coderfeli@163.com >
[ROCm/composable_kernel commit: 902dbe89ad ]
2025-03-20 19:25:11 -07:00
Attila T. Áfra
081e3c7880
Fix compile errors on Windows and Linux ( #2002 )
...
* Fix compile error on Windows (call to 'amd_wave_read_first_lane' is ambiguous)
* Fix compile error (no matching function for call to 'cast_to_f32_from_f8')
[ROCm/composable_kernel commit: c79bf11148 ]
2025-03-20 12:37:25 -07:00
carlushuang
23340c5dd5
[CK_TILE] return value with macro in ck_tile::kernel_launch API ( #1982 )
...
* return value with macro and revert the return value
* [CK-TILE] no-macro launch api solution (#1992 )
* no-macro solution
* address -Wcomma
---------
Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com >
[ROCm/composable_kernel commit: e3c9886cdf ]
2025-03-20 11:00:29 -07:00
jakpiase
f1262b783a
[CK_TILE] Switch to universal gemm for batched and grouped gemms ( #1919 )
...
* switch to universal gemm for batched and grouped gemms
* added reviewer comments
* fixed grouped gemm tests
[ROCm/composable_kernel commit: 0e91d32c61 ]
2025-03-20 11:17:04 +01:00
rocking
b0f323c4ec
Sync the kname with instance name ( #1989 )
...
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
[ROCm/composable_kernel commit: b819c217e4 ]
2025-03-20 00:06:45 +08:00
felix
c2948a0634
Ck moe hot fix ( #1979 )
...
* fix useless code and remove usless oob
* clang format
* fix coredump in e2e test
* fix2
* fix clang format
* fix output oob
* clang format
* rm useless comments
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: 7eaedeb36c ]
2025-03-19 22:58:27 +08:00