Commit Graph

490 Commits

Author SHA1 Message Date
PoYen, Chen
0a2132d758 Add constraint to kMaxSplits 2024-06-12 09:18:39 +00:00
PoYen, Chen
e00ff9d246 Simplify pipeline source code 2024-06-12 09:17:04 +00:00
PoYen, Chen
ff61463cab Use read descriptor to locate lds elements 2024-06-12 04:31:33 +00:00
PoYen, Chen
fcf5cd5e57 Undo removing necessary value-overwrite logic 2024-06-12 04:21:31 +00:00
PoYen, Chen
e1b4ac293e Support load_tile() for tile_window_with_static_lengths<> 2024-06-12 04:20:09 +00:00
PoYen, Chen
a3fad6aae5 Add transposed lds descriptor 2024-06-12 03:46:41 +00:00
PoYen, Chen
ba0bc1507c Remove necessary value-overwrite logic 2024-06-12 03:07:32 +00:00
PoYen, Chen
318b2d5c12 Remove hand-written store_tile() code 2024-06-12 02:54:32 +00:00
PoYen, Chen
a939ec5da4 Set invalid element value for LSEacc tensor view 2024-06-12 02:53:55 +00:00
PoYen, Chen
ff866f6bb6 Support providing invalid element for tensor view 2024-06-12 02:52:07 +00:00
PoYen, Chen
b994668714 Use tensor_descriptor to locate LSEacc elements 2024-06-12 02:32:33 +00:00
PoYen, Chen
ec82f3bbd6 Re-order pipeline call operator arguments 2024-06-11 19:54:30 +00:00
PoYen, Chen
9d1243e7fa Pass LSE/O strides in kernel argument 2024-06-11 19:45:21 +00:00
PoYen, Chen
df4fc8f26c Re-order split-kv pipeline call operator arguments 2024-06-11 19:23:19 +00:00
PoYen, Chen
6ee71c2bf6 Add stride kernel arguments for LSE/O acc workspace 2024-06-11 19:18:22 +00:00
PoYen, Chen
f968a7e442 Remove more debug code in combine pipeline 2024-06-11 18:36:23 +00:00
PoYen, Chen
1c531a0c13 Update license date 2024-06-11 14:29:49 +00:00
PoYen, Chen
16cc9eeef4 Fix unstable clang-format comment 2024-06-11 14:15:52 +00:00
PoYen, Chen
bb6804e315 Add constness to local variables 2024-06-11 14:10:35 +00:00
PoYen, Chen
912a6cb2ea Remove in-consistent comment 2024-06-11 13:56:44 +00:00
PoYen, Chen
95be5c2b9d Remove no-longer used field 2024-06-11 13:46:13 +00:00
PoYen, Chen
893841d745 Undo vector size changes 2024-06-11 13:46:13 +00:00
PoYen, Chen
40c885f007 Fix wrong loop counter step logic 2024-06-11 13:46:13 +00:00
PoYen, Chen
c36cad2e6c Fix wrong LDS indexing logics 2024-06-11 13:46:13 +00:00
PoYen, Chen
d74a1d6ed1 Fix split-kv combine kernel name 2024-06-11 13:46:13 +00:00
PoYen, Chen
f3e213c0c5 Reduce # of combine kernels 2024-06-11 13:46:13 +00:00
PoYen, Chen
180b726f97 Fix wrong kBlockSize used in policy 2024-06-11 13:46:13 +00:00
PoYen, Chen
ffd2768000 Format codes 2024-06-11 13:46:13 +00:00
PoYen, Chen
18a7223b96 Fix wrong layout of LSE/LSEacc/Oacc 2024-06-11 13:46:13 +00:00
PoYen, Chen
064afc69d9 Replace sentinel value before storing 2024-06-11 13:46:13 +00:00
PoYen, Chen
5a6b8d8606 Clean-up code 2024-06-11 13:46:13 +00:00
PoYen, Chen
9ac2654b55 Add SplitKV combine kernel codegen logics 2024-06-11 13:46:13 +00:00
PoYen, Chen
cacce74f2c Add SplitKV kernel codegen logics 2024-06-11 13:46:13 +00:00
Po Yen Chen
abc7e7ed30 Merge branch 'develop' into ck_tile/fa_train 2024-06-04 16:03:01 +08:00
danyao12
327074c3f8 fix error in WarpGemm 2024-06-04 11:42:33 +08:00
danyao12
bdd4a87199 format 2024-06-04 08:26:53 +08:00
zjing14
6fb1f4e03f Post-merge fix of PR 1300 (#1313)
* add f8 gemm with multiD for both row/col wise

* change compute_type to fp8

* changed tuning parameters in the example

* add rcr example

* post-merge fix

* fix

* reduce init range
2024-05-31 22:46:41 -07:00
root
c70662a92e format 2024-06-01 01:42:45 +00:00
Jing Zhang
09e9f10f97 format 2024-05-31 13:59:47 +00:00
root
60b328d597 Merge branch 'ck_tile/fa_train' of github.com:ROCm/composable_kernel into ck_tile/fa_train 2024-05-31 13:51:37 +00:00
Jing Zhang
0d7f71779b format 2024-05-31 13:51:28 +00:00
Po Yen Chen
ff31c6a70c Merge branch 'develop' into ck_tile/fa_train 2024-05-31 15:52:47 +08:00
danyao12
87f73f30e8 Transpose -> transpose 2024-05-29 16:54:26 +08:00
danyao12
58f61716b5 CK_TILE_HOST_DEVICE in philox 2024-05-29 16:20:34 +08:00
zjing14
80db62f08d add f8 gemm multiD with both row/col wise scale (#1300)
* add f8 gemm with multiD for both row/col wise

* change compute_type to fp8

* changed tuning parameters in the example

* add rcr example
2024-05-28 12:04:22 -05:00
danyao12
1c511b3e7d update bwd kernel launch 2024-05-28 23:14:18 +08:00
danyao12
ba6437868b Merge branch 'develop' into ck_tile/fa_train 2024-05-28 11:42:38 +08:00
carlushuang
5055b3bdcb [CK_TILE] support group from cmdline (#1295)
* support cmdline seqlen decode

* silent print

* update readme

* update kernel launch 3d

* update tile partitioner

* fix spill for bf16

* modify based on comment

* modify payload_t

* fix bug for alibi mode

* fix alibi test err

* refactor kernel launch, support select timer

* add missing file

* remove useless code

* add some comments
2024-05-28 11:13:21 +08:00
Bartłomiej Kocot
fd72380aeb Optimize grouped conv bwd weight for small M and N (#1303)
* Optimize grouped conv bwd weight for small M and N

* Fixes
2024-05-22 21:01:01 +02:00
Illia Silin
06b891c5c2 aggregate device macros in ck_tile config header (#1297) 2024-05-20 08:34:45 -07:00