Commit Graph

471 Commits

Author SHA1 Message Date
PoYen, Chen
912a6cb2ea Remove in-consistent comment 2024-06-11 13:56:44 +00:00
PoYen, Chen
95be5c2b9d Remove no-longer used field 2024-06-11 13:46:13 +00:00
PoYen, Chen
893841d745 Undo vector size changes 2024-06-11 13:46:13 +00:00
PoYen, Chen
40c885f007 Fix wrong loop counter step logic 2024-06-11 13:46:13 +00:00
PoYen, Chen
c36cad2e6c Fix wrong LDS indexing logics 2024-06-11 13:46:13 +00:00
PoYen, Chen
d74a1d6ed1 Fix split-kv combine kernel name 2024-06-11 13:46:13 +00:00
PoYen, Chen
f3e213c0c5 Reduce # of combine kernels 2024-06-11 13:46:13 +00:00
PoYen, Chen
180b726f97 Fix wrong kBlockSize used in policy 2024-06-11 13:46:13 +00:00
PoYen, Chen
ffd2768000 Format codes 2024-06-11 13:46:13 +00:00
PoYen, Chen
18a7223b96 Fix wrong layout of LSE/LSEacc/Oacc 2024-06-11 13:46:13 +00:00
PoYen, Chen
064afc69d9 Replace sentinel value before storing 2024-06-11 13:46:13 +00:00
PoYen, Chen
5a6b8d8606 Clean-up code 2024-06-11 13:46:13 +00:00
PoYen, Chen
9ac2654b55 Add SplitKV combine kernel codegen logics 2024-06-11 13:46:13 +00:00
PoYen, Chen
cacce74f2c Add SplitKV kernel codegen logics 2024-06-11 13:46:13 +00:00
Po Yen Chen
abc7e7ed30 Merge branch 'develop' into ck_tile/fa_train 2024-06-04 16:03:01 +08:00
danyao12
327074c3f8 fix error in WarpGemm 2024-06-04 11:42:33 +08:00
danyao12
bdd4a87199 format 2024-06-04 08:26:53 +08:00
zjing14
6fb1f4e03f Post-merge fix of PR 1300 (#1313)
* add f8 gemm with multiD for both row/col wise

* change compute_type to fp8

* changed tuning parameters in the example

* add rcr example

* post-merge fix

* fix

* reduce init range
2024-05-31 22:46:41 -07:00
root
c70662a92e format 2024-06-01 01:42:45 +00:00
Jing Zhang
09e9f10f97 format 2024-05-31 13:59:47 +00:00
root
60b328d597 Merge branch 'ck_tile/fa_train' of github.com:ROCm/composable_kernel into ck_tile/fa_train 2024-05-31 13:51:37 +00:00
Jing Zhang
0d7f71779b format 2024-05-31 13:51:28 +00:00
Po Yen Chen
ff31c6a70c Merge branch 'develop' into ck_tile/fa_train 2024-05-31 15:52:47 +08:00
danyao12
87f73f30e8 Transpose -> transpose 2024-05-29 16:54:26 +08:00
danyao12
58f61716b5 CK_TILE_HOST_DEVICE in philox 2024-05-29 16:20:34 +08:00
zjing14
80db62f08d add f8 gemm multiD with both row/col wise scale (#1300)
* add f8 gemm with multiD for both row/col wise

* change compute_type to fp8

* changed tuning parameters in the example

* add rcr example
2024-05-28 12:04:22 -05:00
danyao12
1c511b3e7d update bwd kernel launch 2024-05-28 23:14:18 +08:00
danyao12
ba6437868b Merge branch 'develop' into ck_tile/fa_train 2024-05-28 11:42:38 +08:00
carlushuang
5055b3bdcb [CK_TILE] support group from cmdline (#1295)
* support cmdline seqlen decode

* silent print

* update readme

* update kernel launch 3d

* update tile partitioner

* fix spill for bf16

* modify based on comment

* modify payload_t

* fix bug for alibi mode

* fix alibi test err

* refactor kernel launch, support select timer

* add missing file

* remove useless code

* add some comments
2024-05-28 11:13:21 +08:00
Bartłomiej Kocot
fd72380aeb Optimize grouped conv bwd weight for small M and N (#1303)
* Optimize grouped conv bwd weight for small M and N

* Fixes
2024-05-22 21:01:01 +02:00
Illia Silin
06b891c5c2 aggregate device macros in ck_tile config header (#1297) 2024-05-20 08:34:45 -07:00
Illia Silin
1274861a9d replace the ENV macro with CK_ENV (#1296) 2024-05-17 10:42:51 -07:00
rocking
aaa8dfdae9 Fix compile error (#1292)
error: no viable conversion from returned value of type '__half' to function return type 'fp16_hip_t' (aka '_Float16')

Co-authored-by: carlushuang <carlus.huang@amd.com>
2024-05-17 17:19:17 +08:00
Illia Silin
c44137838e remove wrong use of nonexistent class members (#1290) 2024-05-15 08:08:17 -07:00
carlushuang
dd0dd13d4e remove operator-deref (#1291) 2024-05-15 08:06:50 -07:00
danyao12
826a894335 support bwd alibi 2024-05-15 21:55:02 +08:00
jakpiase
3e3471d5d2 Add unit tests for grouped gemm two stage (#1256)
* add unit tests for grouped gemm two stage

* add reviewers suggestions

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
2024-05-15 10:03:39 +02:00
danyao12
a84009f83b bwd alibi 2024-05-13 10:39:44 +08:00
carlushuang
35f59c04e6 Merge remote-tracking branch 'origin/develop' into ck_tile/fa_train 2024-05-12 23:03:10 +00:00
carlushuang
bd9cd53885 now fwd/bwd can build 2024-05-12 22:33:22 +00:00
carlushuang
90700dbefa [CK_TILE] support alibi (#1269)
* add alibi support

* fix code

* update code based on comment

* Support more hdim

* fix fp8 bias

* support seqlen_k=0 case

* remove unused printf

* fix format

---------

Co-authored-by: rocking <ChunYu.Lai@amd.com>
2024-05-11 10:43:56 +00:00
Illia Silin
566b6480a2 Code clean-up (#1285)
* code clean-up

* remove the profiling output samples
2024-05-10 09:41:39 -07:00
Bartłomiej Kocot
8346af9c68 Change output gemm type to AccDataType in two stage conv bwd wei (#1283) 2024-05-10 10:57:42 +02:00
danyao12
15187df456 epilogue reuse 2024-05-10 10:57:53 +08:00
Adam Osewski
a0ae1c6133 Fix MakeArgument (#1284) 2024-05-09 09:42:41 -07:00
Adam Osewski
3c043cd10b Add vector instruction coherency bits for gfx94 targets. (#1268) 2024-05-09 07:30:17 -07:00
danyao12
e1a21655ae FA bwd 2024-05-09 17:08:08 +08:00
Illia Silin
fdbf8ccbd7 fix the output formatting (#1282) 2024-05-08 16:11:54 -07:00
Bartłomiej Kocot
0b6b5d1785 Add two stage grouped conv bwd weight kernel (#1280) 2024-05-08 09:53:24 +02:00
Illia Silin
bf42097646 Enable logging in CK with environment variable. (#1278)
* enable logging using environment variable

* update ck.hpp header

* fix typo

* fix clang format

* Update include/ck/utility/env.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
2024-05-07 16:26:43 -07:00