Commit Graph

208 Commits

Author SHA1 Message Date
joye
b406b584d7 fix clang format check 2025-04-28 08:56:00 +08:00
joye
292f89c860 fix a transpose index issue 2025-04-27 17:39:52 +08:00
joye
bd091d9a88 fix a transpose index issue 2025-04-27 17:33:32 +08:00
joye
de9407ed93 fix 16x16 related dimension transpose 2025-04-27 02:44:49 -05:00
joye
57e8e34705 exchange the iteration order 2025-04-26 21:37:59 -05:00
joye
25ab38f913 fix transpose related codes 2025-04-27 09:55:29 +08:00
joye
a823e00dc0 delete unused variables 2025-04-25 14:47:49 +08:00
joye
a6564da629 update output tensor distribution 2025-04-25 14:45:45 +08:00
joye
119a8e0e16 hack for transpose 16x16 2025-04-25 00:56:59 -05:00
joye
e2f3c95d24 miss output tile distribution mapping 2025-04-24 21:55:03 -05:00
joye
6beb585dad update tile transpose 2025-04-24 20:36:09 -05:00
joye
efa7243ee5 transpose load enable 2025-04-24 19:19:44 -05:00
joye
df9769afba can pass; but no logic 2025-04-24 19:05:47 -05:00
joye
90a4501869 add some fix 2025-04-24 18:29:38 +08:00
joye
34040f43b6 update some codes 2025-04-24 17:53:19 +08:00
joye
8d75983536 fix a distribution issue 2025-04-24 02:24:31 -05:00
joye
6893165818 add some fixes 2025-04-24 01:20:35 -05:00
joye
afb1cec9c4 update transpose load logic 2025-04-24 11:14:07 +08:00
joye
3918a35870 Merge branch 'mi355_transpose_load_dev' of https://github.com/ROCm/composable_kernel into mi355_transpose_load_dev 2025-04-24 08:28:14 +08:00
joye
aaab4aacc5 Merge branch 'develop' of https://github.com/ROCm/composable_kernel into mi355_transpose_load_dev 2025-04-24 08:27:53 +08:00
joye
c862437f05 fix some issues 2025-04-23 19:27:29 -05:00
carlushuang
5487289fc4 [CK_TILE] support gfx950 matrix core in 01_fmha fwd (#2110)
* gfx950 01_fmha fwd

* fix comment

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-04-23 12:40:18 -07:00
joye
acce2df3bf fix some compile errors 2025-04-23 05:27:52 -05:00
joye
e14a16359f add transpose load; no real logic 2025-04-23 16:55:13 +08:00
Gino Lu
504f563f78 [CK-Tile] warp-gemm support for using V_MFMA_F32_16x16x32_BF16 (#2073)
* draft v_mfma_f32_16x16x32_bf16

* fix error config and add debug code.

* Solve the CShuffle Problem

* draft v_mfma_f32_16x16x32_bf16

* fix error config and add debug code.

* Solve the CShuffle Problem

* fix error while testing new command

* Finished the feature of new mfma 16*16*32

* Addressed the comment

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>
2025-04-22 15:52:36 -07:00
Thomas Ning
a738e43445 MFMA 16x16x32fp8 (#2103)
* add mfma_16x16x32_fp8

* clang format code

* Finished the fix for gemm basic

* clang foramt

* rebuild CI

* recover gemm.hpp

* add MFMA 16*16*32bf8

---------

Co-authored-by: solin <bingzhou@amd.com>
2025-04-21 10:21:35 -07:00
solin
c318ec0778 fix CI build fail 2025-04-21 16:00:12 +08:00
BingYuan.Zhou
eaf1f0bf3b [flatmm] implement basic fp16 flatmm (#2089)
* [flatmm] implement basic fp16 flatmm

* fix CI build fail

---------

Co-authored-by: root <root@hjbog-srdc-50.amd.com>
Co-authored-by: solin <bingzhou@amd.com>
2025-04-16 16:51:17 +08:00
Thomas Ning
269f4f6af5 Solve the Static Encoding Pattern compile error when the tile size is too small (#2079) 2025-04-13 20:09:30 -07:00
jakpiase
6c61f4d237 [CK_TILE] Add 2:4 structured sparsity support for fp16 gemm (#1957)
* add structured sparsity fp16 support for gemm

* added reviewer suggestions

* update changelog

* update changelog

* add reviewers suggestions

* Minor fix

* clang fix

* fix doxygen
2025-04-11 12:18:26 +02:00
slippedJim
5f885d2b7a add fmha fwd splitkv receipt for aiter c++ api (#2068)
* add s_randval for c++ api

* Fix bug of bias in splitkv

---------

Co-authored-by: rocking <ChunYu.Lai@amd.com>
2025-04-10 23:21:13 +08:00
Juan Manuel Martinez Caamaño
f14e648e7c Replace inline assembly with builtins in FHMA (#2067)
* Replace inline assembly with builtins in FHMA

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
2025-04-10 09:48:37 +02:00
Illia Silin
3e6d21adeb enable gfx115x support (#2065) 2025-04-09 10:06:42 -07:00
MHYang-gh
03ce8729fd Make buffer coherence configurable in tensor view (#2041)
* Make buffer coherence configurable in tensor view

* Fix clang-format for tensor_view.hpp
2025-04-08 15:34:11 -07:00
Max Podkorytov
6ce0797dad simplify generate_tuple (#2043) 2025-04-08 09:00:51 -07:00
aledudek
80aae6119b [CK_TILE] Fix GEMM Memory Pipeline (#2034)
* [CK_TILE] Fix GEMM Memory Pipeline

* Fix transpose tile

* Add comments
2025-04-08 12:40:04 +02:00
Illia Silin
572cd820ce Split env.hpp header from the ck.hpp header. (#2049)
* split env.hpp out of main headers

* fix namespace logic
2025-04-03 15:30:21 -07:00
Adam Osewski
e5ad48a784 Basic docs for universal gemm & ck-tile gemm. (#2014)
* Basic docs for universal gemm & ck-tile gemm.

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Reviewers suggestions.

* Align tparam names in doc with class tparams.

* More reviewers fine tuning ;)

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
2025-04-02 11:03:40 +02:00
Seunghoon Lee
df32020f93 Fix Windows build. (#2012)
* Remove duplicate using uint64_t.

* Cast before shift.
2025-04-01 12:22:10 -07:00
Max Podkorytov
c59a8bb206 add a fast compilation path for static for (0..N) (#2005)
* add a fast compilation path for static for (0..N)

* Update functional2.hpp

add comment and put range applier into detail namespace

* Update functional.hpp

ditto for ck-tile

* prettify

* prettify more

* add comment

* clang-format
2025-04-01 12:06:25 -07:00
rocking
8a20b62e91 Reduce redundant space in bias tensor (#2024)
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2025-03-28 21:58:06 +08:00
felix
a82f338fb9 hotfix fix sorting int64 (#2025)
* fix sorting int64

* clang format

* fix example issue

* update WA issue #

---------

Co-authored-by: coderfeli <coderfeli@163.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>
2025-03-28 11:31:52 +08:00
ruanjm
d49abdaa87 [CK_TILE] Improve RMS/Layer Normalization 2 Pass Pipeline Performance (#1861)
* 50ms -> 28ms

* Fix bug in non fuse_add_store cases

* Fine tuned setting for 2 pass pipeline

* adjust workload

* remove unnecessary change

* add layernorm

* Adding output quant and unquant results at the same time.

* fix test

* fix format

* tune for cases 128x640 and 128x1024

* bug ifx
2025-03-25 20:09:45 +08:00
MHYang-gh
c027637a8f Fix A/B lds transform (#2007) 2025-03-22 23:13:50 -07:00
BingYuan.Zhou
5a0d693b86 fix ck_tile/basic_gemm build error (#1988) 2025-03-20 22:01:14 -07:00
carlushuang
e3c9886cdf [CK_TILE] return value with macro in ck_tile::kernel_launch API (#1982)
* return value with macro and revert the return value

* [CK-TILE] no-macro launch api solution (#1992)

* no-macro solution

* address -Wcomma

---------

Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>
2025-03-20 11:00:29 -07:00
jakpiase
0e91d32c61 [CK_TILE] Switch to universal gemm for batched and grouped gemms (#1919)
* switch to universal gemm for batched and grouped gemms

* added reviewer comments

* fixed grouped gemm tests
2025-03-20 11:17:04 +01:00
rocking
b819c217e4 Sync the kname with instance name (#1989)
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2025-03-20 00:06:45 +08:00
carlushuang
3e81279d26 Reapply "[CK_TILE] support hdim=192/128 pair for deepseekv3 (#1961)" … (#1971)
* Reapply "[CK_TILE] support hdim=192/128 pair for deepseekv3 (#1961)" (#1969)

This reverts commit 8cbcd3e0d0.

* fix codegen problem

* Update config.hpp

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2025-03-13 11:41:39 +08:00
Illia Silin
4c97cc511e use old instrinsics with staging compiler (#1970) 2025-03-12 07:29:09 -07:00