PoYen, Chen
fcf5cd5e57
Undo removing necessary value-overwrite logic
2024-06-12 04:21:31 +00:00
PoYen, Chen
e1b4ac293e
Support load_tile() for tile_window_with_static_lengths<>
2024-06-12 04:20:09 +00:00
PoYen, Chen
a3fad6aae5
Add transposed lds descriptor
2024-06-12 03:46:41 +00:00
PoYen, Chen
ba0bc1507c
Remove necessary value-overwrite logic
2024-06-12 03:07:32 +00:00
PoYen, Chen
318b2d5c12
Remove hand-written store_tile() code
2024-06-12 02:54:32 +00:00
PoYen, Chen
a939ec5da4
Set invalid element value for LSEacc tensor view
2024-06-12 02:53:55 +00:00
PoYen, Chen
ff866f6bb6
Support providing invalid element for tensor view
2024-06-12 02:52:07 +00:00
PoYen, Chen
b994668714
Use tensor_descriptor to locate LSEacc elements
2024-06-12 02:32:33 +00:00
PoYen, Chen
ec82f3bbd6
Re-order pipeline call operator arguments
2024-06-11 19:54:30 +00:00
PoYen, Chen
9d1243e7fa
Pass LSE/O strides in kernel argument
2024-06-11 19:45:21 +00:00
PoYen, Chen
df4fc8f26c
Re-order split-kv pipeline call operator arguments
2024-06-11 19:23:19 +00:00
PoYen, Chen
6ee71c2bf6
Add stride kernel arguments for LSE/O acc workspace
2024-06-11 19:18:22 +00:00
PoYen, Chen
f968a7e442
Remove more debug code in combine pipeline
2024-06-11 18:36:23 +00:00
PoYen, Chen
1c531a0c13
Update license date
2024-06-11 14:29:49 +00:00
PoYen, Chen
16cc9eeef4
Fix unstable clang-format comment
2024-06-11 14:15:52 +00:00
PoYen, Chen
bb6804e315
Add constness to local variables
2024-06-11 14:10:35 +00:00
PoYen, Chen
912a6cb2ea
Remove in-consistent comment
2024-06-11 13:56:44 +00:00
PoYen, Chen
95be5c2b9d
Remove no-longer used field
2024-06-11 13:46:13 +00:00
PoYen, Chen
893841d745
Undo vector size changes
2024-06-11 13:46:13 +00:00
PoYen, Chen
40c885f007
Fix wrong loop counter step logic
2024-06-11 13:46:13 +00:00
PoYen, Chen
c36cad2e6c
Fix wrong LDS indexing logics
2024-06-11 13:46:13 +00:00
PoYen, Chen
d74a1d6ed1
Fix split-kv combine kernel name
2024-06-11 13:46:13 +00:00
PoYen, Chen
f3e213c0c5
Reduce # of combine kernels
2024-06-11 13:46:13 +00:00
PoYen, Chen
180b726f97
Fix wrong kBlockSize used in policy
2024-06-11 13:46:13 +00:00
PoYen, Chen
ffd2768000
Format codes
2024-06-11 13:46:13 +00:00
PoYen, Chen
18a7223b96
Fix wrong layout of LSE/LSEacc/Oacc
2024-06-11 13:46:13 +00:00
PoYen, Chen
064afc69d9
Replace sentinel value before storing
2024-06-11 13:46:13 +00:00
PoYen, Chen
5a6b8d8606
Clean-up code
2024-06-11 13:46:13 +00:00
PoYen, Chen
9ac2654b55
Add SplitKV combine kernel codegen logics
2024-06-11 13:46:13 +00:00
PoYen, Chen
cacce74f2c
Add SplitKV kernel codegen logics
2024-06-11 13:46:13 +00:00
danyao12
327074c3f8
fix error in WarpGemm
2024-06-04 11:42:33 +08:00
danyao12
bdd4a87199
format
2024-06-04 08:26:53 +08:00
root
c70662a92e
format
2024-06-01 01:42:45 +00:00
Jing Zhang
09e9f10f97
format
2024-05-31 13:59:47 +00:00
Jing Zhang
0d7f71779b
format
2024-05-31 13:51:28 +00:00
danyao12
87f73f30e8
Transpose -> transpose
2024-05-29 16:54:26 +08:00
danyao12
58f61716b5
CK_TILE_HOST_DEVICE in philox
2024-05-29 16:20:34 +08:00
danyao12
1c511b3e7d
update bwd kernel launch
2024-05-28 23:14:18 +08:00
danyao12
ba6437868b
Merge branch 'develop' into ck_tile/fa_train
2024-05-28 11:42:38 +08:00
carlushuang
5055b3bdcb
[CK_TILE] support group from cmdline ( #1295 )
...
* support cmdline seqlen decode
* silent print
* update readme
* update kernel launch 3d
* update tile partitioner
* fix spill for bf16
* modify based on comment
* modify payload_t
* fix bug for alibi mode
* fix alibi test err
* refactor kernel launch, support select timer
* add missing file
* remove useless code
* add some comments
2024-05-28 11:13:21 +08:00
Illia Silin
06b891c5c2
aggregate device macros in ck_tile config header ( #1297 )
2024-05-20 08:34:45 -07:00
rocking
aaa8dfdae9
Fix compile error ( #1292 )
...
error: no viable conversion from returned value of type '__half' to function return type 'fp16_hip_t' (aka '_Float16')
Co-authored-by: carlushuang <carlus.huang@amd.com >
2024-05-17 17:19:17 +08:00
carlushuang
dd0dd13d4e
remove operator-deref ( #1291 )
2024-05-15 08:06:50 -07:00
danyao12
826a894335
support bwd alibi
2024-05-15 21:55:02 +08:00
danyao12
a84009f83b
bwd alibi
2024-05-13 10:39:44 +08:00
carlushuang
bd9cd53885
now fwd/bwd can build
2024-05-12 22:33:22 +00:00
carlushuang
90700dbefa
[CK_TILE] support alibi ( #1269 )
...
* add alibi support
* fix code
* update code based on comment
* Support more hdim
* fix fp8 bias
* support seqlen_k=0 case
* remove unused printf
* fix format
---------
Co-authored-by: rocking <ChunYu.Lai@amd.com >
2024-05-11 10:43:56 +00:00
danyao12
15187df456
epilogue reuse
2024-05-10 10:57:53 +08:00
danyao12
e1a21655ae
FA bwd
2024-05-09 17:08:08 +08:00
carlushuang
851c3ed157
[CK_TILE] support alibi ( #1269 )
...
* add alibi support
* fix code
* update code based on comment
* Support more hdim
* fix fp8 bias
* support seqlen_k=0 case
* remove unused printf
* fix format
---------
Co-authored-by: rocking <ChunYu.Lai@amd.com >
2024-05-07 22:32:54 +08:00