Khushbu Agarwal
3bda57c204
file clang formatted ( #2053 )
2025-04-03 16:55:49 -07:00
Khushbu Agarwal
b443056a26
Documentation for newly added struct ( #2051 )
2025-04-03 16:24:34 -07:00
Illia Silin
572cd820ce
Split env.hpp header from the ck.hpp header. ( #2049 )
...
* split env.hpp out of main headers
* fix namespace logic
2025-04-03 15:30:21 -07:00
Muhammed Emin Ozturk
7142d8003c
CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test ( #2044 )
...
* fix and split gemm_universal test
* clang
* Update test_gemm_universal_ut_cases_bf16.inc
* Update test_gemm_universal_xdl_bf16.cpp
* Update test_gemm_universal_ut_cases_fp16.inc
2025-04-03 14:22:43 -07:00
Khushbu Agarwal
fed0709121
[New] Build up the feature of CK Tile GEMM CodeGen ( #1994 )
...
* New branch for codegen changes
* Fix verify function for int4
* pk_int4 codegen
* Update to review comments
* Remove codegen directory and rename filenames
* Remove extra files; clean up CMake file
* New branch for codegen changes
* Fix verify function for int4
* pk_int4 codegen
* Update to review comments
* Remove codegen directory and rename filenames
* Remove extra files; clean up CMake file
* code changes for single instance
* config file rename, added few more combinations in json file
* Fix cmake file
* Addressing review comments
* Reverting files changed by merge to develop
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com >
2025-04-03 11:54:12 -07:00
Thomas Ning
50d1f8ff90
Add the MI355 support for CK TILE GEMM ( #2046 )
...
* Get the root cause of the ck tile gemm failing on mi355
* Fix the ck tile gemm on MI355
* delete the debug info
2025-04-03 11:48:54 -07:00
Rostyslav Geyyer
265af71a71
Add FP16/BF16<->FP8/BF8 conversions ( #2035 )
...
* Move conversion functions and add missing conversions
* Add tests
* Add missing conversions
* Add missing conversions
* Add bf8 tests
* Update clipping for vectors
* Add missing conversions
* Add bf16 fp8 tests
* Add bf16 bf8 tests
* Fix device conversion
* Fix conversions
* Fix vector use
* Minor fix
* Add a workaround flag
* Add a workaround flag for bf16 conversion
* Add another workaround
* Add a workaround for fp16 to bf8 conversion
* Update type alias
* Add docstrings and missing wrappers
* Fix if defined macros
* Fix more if defined macros
* Add comments
* Remove __host__ specifier
* Add a gfx950 guard
* Update function naming
2025-04-03 12:42:03 -05:00
aledudek
9329432f6c
Post-merge changes for fully async args copy in ck grouped gemm ( #1991 )
...
* Post-merge changes for fully async args copy in ck grouped gemm
* Post-merge documentation and naming changes
* Build fix and updated changelog
* Revised comments
2025-04-03 13:35:43 +02:00
Bartłomiej Kocot
2ccf914888
Add support for GKCYX grouped conv weight ( #2023 )
...
* Grouped conv bwd weight GKCYX support
* fix and changelog
* fix
* fix
* fixes
* comments
* fix
2025-04-02 23:59:49 +02:00
Adam Osewski
e5ad48a784
Basic docs for universal gemm & ck-tile gemm. ( #2014 )
...
* Basic docs for universal gemm & ck-tile gemm.
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Reviewers suggestions.
* Align tparam names in doc with class tparams.
* More reviewers fine tuning ;)
---------
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
2025-04-02 11:03:40 +02:00
Bartłomiej Kocot
8c0ab61ece
Grouped conv backward data GKCYX support ( #2029 )
...
* Grouped conv backward data GKCYX support
* profiler
* Converter
* split instances
2025-04-01 13:24:38 -07:00
Bartłomiej Kocot
ec742908bd
Grouped conv fwd v3 fix for SplitN an G > 1 ( #2038 )
...
* Grouped conv fwd v3 fix for SplitN an G > 1
* Remove int8 large test
* Retore int8 test
2025-04-01 13:19:35 -07:00
Seunghoon Lee
df32020f93
Fix Windows build. ( #2012 )
...
* Remove duplicate using uint64_t.
* Cast before shift.
2025-04-01 12:22:10 -07:00
Max Podkorytov
c59a8bb206
add a fast compilation path for static for (0..N) ( #2005 )
...
* add a fast compilation path for static for (0..N)
* Update functional2.hpp
add comment and put range applier into detail namespace
* Update functional.hpp
ditto for ck-tile
* prettify
* prettify more
* add comment
* clang-format
2025-04-01 12:06:25 -07:00
Bartłomiej Kocot
6355ee7ca5
Improve compilation time for grouped conv fwd ( #2039 )
...
* Improve compilation time for grouped conv fwd
* Fix
2025-04-01 07:11:42 -07:00
Muhammed Emin Ozturk
dd4c12b155
f8/bf16 GEMM Stream-K ( #1879 )
2025-03-31 20:30:17 -06:00
jefyang1
16b15e336a
Fix gemm universal and grouped_conv_fwd test failures on gfx950 ( #2031 )
2025-03-31 09:20:52 -07:00
Adel Johar
fc073b483e
Docs: Add precision support reference page ( #1973 )
...
* Docs: Add precision support reference page
* edit of the precision type content
* added more description on scalars
---------
Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com >
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
2025-03-28 08:12:27 -06:00
rocking
8a20b62e91
Reduce redundant space in bias tensor ( #2024 )
...
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
2025-03-28 21:58:06 +08:00
felix
a82f338fb9
hotfix fix sorting int64 ( #2025 )
...
* fix sorting int64
* clang format
* fix example issue
* update WA issue #
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
2025-03-28 11:31:52 +08:00
Illia Silin
d142e15f5e
add gfx950 to default targets for rocm6.4+ ( #2032 )
2025-03-27 18:48:47 -07:00
spolifroni-amd
a426f67301
creation of install doc and refactor of doc in general ( #1908 )
...
* creation of install doc and refactor of doc in general
* updates based on review comments
* updated based on review comments
* updated readme and contributors markdown
* added extra note to not use -j on its own
* added note about smoke tests and regression tests
* made changes as per Illia's feedback
---------
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
2025-03-27 15:13:18 -06:00
felix
36d50de50e
ckmoe: change cmake; use smaller shape for i4 ( #2027 )
...
* change cmake; use smaller shape for i4
* fix pki4 run
* fix typo
* fix runtime arch logic for moe_gemm2 example
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
2025-03-27 09:04:31 -07:00
Rostyslav Geyyer
441343a23d
Add MX FP4 device conversion tests ( #1889 )
...
* Add conversion tests
* Fix ctor
* Fix nan logic
* Fix conversion logic
* Permute packed f4_t values
* Fix conversion to float, repack vector elements
* Fix device tests
* Permute elements in a vector
* Add a repro test
* Add a conversion for a repro test
* Update test vectors
* Update conversion
* Fix the test
* Update test vector generator
* Fix vector sr conversion
* Permute conversion args
* Update conversion
* Test
* Fix packing
* Simplify conversion function
* Pack conversion in a loop
* Pack conversion in a loop
* Pack another conversion in a loop
* Pack one more conversion in a loop
* Pack the last conversion in a loop
* Clean up
* Add printf to fix intrinsic
* Add a sw-based workaround
2025-03-26 19:23:01 -05:00
Illia Silin
23a949706c
Disable all pk_i4 tests for all targets except gfx942/950. ( #2022 )
...
* only build gemm_fp8_pk_i4 examples for gfx942/950
* fix cmake logic
* moved the architecture check to IsSupported function
* Revert "moved the architecture check to IsSupported function"
This reverts commit 056d2a08b3 .
* disable all pk_i4 tests for targets other than gfx942/950
* fix cmake logic
2025-03-26 15:15:57 -07:00
Bartłomiej Kocot
54c81a1fcf
Add support for GKCYX grouped conv fwd ( #2015 )
...
* Add support for GKCYX grouped conv fwd
* fixes
* fix
* changelog
* Fixes
2025-03-26 21:13:38 +01:00
Illia Silin
fd915b83f7
fix clang format ( #2021 )
2025-03-26 09:42:10 -07:00
Mirza Halilčević
21e0ca197d
Add default arguments for prologue and epilogue. ( #2020 )
2025-03-26 09:28:40 -07:00
Illia Silin
99b2bbc1d6
Make sure gemm_fp8_pk_i4 examples only build and run on gfx942/950. ( #2010 )
...
* only build gemm_fp8_pk_i4 examples for gfx942/950
* fix cmake logic
* moved the architecture check to IsSupported function
* Revert "moved the architecture check to IsSupported function"
This reverts commit 056d2a08b3 .
2025-03-25 14:43:38 -07:00
Andriy Roshchenko
72d888821c
MX GEMM examples with FP8, FP16, and E8M0 scales ( #2016 )
...
* Add `scalar_type` specification for E8M0 exponent
* Specialize `nnvb_data_t_selector` for E8M0 exponent
* Remove partial specializations for `scalar_type` of `non_native_vector_base` template
* Reword command line helper string
* Create MX GEMM examples for different scales
2025-03-25 15:33:03 -06:00
Illia Silin
44c093ba0c
Enable ClangBuildAnalizer when doing ninja build traces. ( #2009 )
...
* enable ClangBuildAnalizer when doing ninja traces
* add branch and date to clang build log name
* fix jenkins syntax
* fix jenkins syntax once more
* fix jenkins syntax once more
* simplify the clang_build log name
* simplify the clang_build log name further
2025-03-25 12:27:04 -07:00
Max Podkorytov
1a58522f01
use fast path for sequence generation in old CK ( #1993 )
2025-03-25 11:28:44 -07:00
ruanjm
d49abdaa87
[CK_TILE] Improve RMS/Layer Normalization 2 Pass Pipeline Performance ( #1861 )
...
* 50ms -> 28ms
* Fix bug in non fuse_add_store cases
* Fine tuned setting for 2 pass pipeline
* adjust workload
* remove unnecessary change
* add layernorm
* Adding output quant and unquant results at the same time.
* fix test
* fix format
* tune for cases 128x640 and 128x1024
* bug ifx
2025-03-25 20:09:45 +08:00
Illia Silin
d2eab23958
Split up data_type header. ( #1996 )
...
* split fp64 vector data type
* add missing header
* move e8m0 structs
* split off numeric_utils header
* fix typo
* split off numeric limits header
* update data_type header
* fix clang format
* split off vector type header
* fix clang format
* fix typo for binary_inf
2025-03-24 15:08:54 -07:00
Andriy Roshchenko
6660dc6b8e
Introduce MX GEMM for FP8 data type ( #2000 )
2025-03-24 15:41:07 -06:00
MHYang-gh
c027637a8f
Fix A/B lds transform ( #2007 )
2025-03-22 23:13:50 -07:00
Bartłomiej Kocot
5b0873c31a
Fix split N for large images in groupd conv fwd ( #2004 )
...
* Fix split N for large images in groupd conv fwd
* Fix comments
2025-03-22 23:19:49 +01:00
carlushuang
6c08c5c46d
add mask support in hdim=192/128 ( #1999 )
2025-03-21 18:28:43 +08:00
BingYuan.Zhou
5a0d693b86
fix ck_tile/basic_gemm build error ( #1988 )
2025-03-20 22:01:14 -07:00
felix
902dbe89ad
change cmake ( #2006 )
...
Co-authored-by: coderfeli <coderfeli@163.com >
2025-03-20 19:25:11 -07:00
Attila T. Áfra
c79bf11148
Fix compile errors on Windows and Linux ( #2002 )
...
* Fix compile error on Windows (call to 'amd_wave_read_first_lane' is ambiguous)
* Fix compile error (no matching function for call to 'cast_to_f32_from_f8')
2025-03-20 12:37:25 -07:00
carlushuang
e3c9886cdf
[CK_TILE] return value with macro in ck_tile::kernel_launch API ( #1982 )
...
* return value with macro and revert the return value
* [CK-TILE] no-macro launch api solution (#1992 )
* no-macro solution
* address -Wcomma
---------
Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com >
2025-03-20 11:00:29 -07:00
jakpiase
0e91d32c61
[CK_TILE] Switch to universal gemm for batched and grouped gemms ( #1919 )
...
* switch to universal gemm for batched and grouped gemms
* added reviewer comments
* fixed grouped gemm tests
2025-03-20 11:17:04 +01:00
rocking
b819c217e4
Sync the kname with instance name ( #1989 )
...
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
2025-03-20 00:06:45 +08:00
felix
7eaedeb36c
Ck moe hot fix ( #1979 )
...
* fix useless code and remove usless oob
* clang format
* fix coredump in e2e test
* fix2
* fix clang format
* fix output oob
* clang format
* rm useless comments
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
2025-03-19 22:58:27 +08:00
Bartłomiej Kocot
fdaff5603e
Add grouped conv bwd wei merged grouped instance for larger filter ( #1984 )
...
* Add grouped conv bwd wei merged grouped instance for larger filter
* Update readme
2025-03-18 16:16:24 +01:00
Illia Silin
1342ecf7fb
Add a daily CI build on gfx908. ( #1987 )
...
* add one daily ci build on gfx908
* add redis invocation tag for gfx908
* make ci build for gfx908 conditional
* fix groovy logic
* add option to run perf tests for gfx908
* disable a few tests on mi100
2025-03-17 18:08:53 -07:00
Illia Silin
07f25186b2
disable ck_tile basic gemm ( #1986 )
2025-03-17 15:26:43 -07:00
aledudek
5095906975
Async grouped gemm v3 ( #1940 )
...
* Fully async grouped gemm
* Remove commented code
* Remvoe maybe_unused
* host kernel args
* Checkpoint segfault debugging...
* Working part1
* Working part2
* Remvoe comments...
* Use void ptr for gemm kernel host args
* Fix device_grouped_gemm_multiple_d_dl build issue
* Fix device_grouped_gemm_xdl build issue
2025-03-17 16:42:43 +01:00
Bartłomiej Kocot
c2e4898b4b
Grouped conv bwd data NGCHW ( #1967 )
...
* Grouped conv bwd data NGCHW
* fixes
* fix
* Improvements
* Fix
* Fix
* add client example
2025-03-17 13:32:00 +01:00