Commit Graph

1349 Commits

Author SHA1 Message Date
PoYen, Chen
c7d9c80b77 Default use kMaxSplits=64 in generate.py 2024-06-12 09:20:26 +00:00
PoYen, Chen
0a2132d758 Add constraint to kMaxSplits 2024-06-12 09:18:39 +00:00
PoYen, Chen
e00ff9d246 Simplify pipeline source code 2024-06-12 09:17:04 +00:00
PoYen, Chen
ff61463cab Use read descriptor to locate lds elements 2024-06-12 04:31:33 +00:00
PoYen, Chen
fcf5cd5e57 Undo removing necessary value-overwrite logic 2024-06-12 04:21:31 +00:00
PoYen, Chen
e1b4ac293e Support load_tile() for tile_window_with_static_lengths<> 2024-06-12 04:20:09 +00:00
PoYen, Chen
a3fad6aae5 Add transposed lds descriptor 2024-06-12 03:46:41 +00:00
PoYen, Chen
ba0bc1507c Remove necessary value-overwrite logic 2024-06-12 03:07:32 +00:00
PoYen, Chen
318b2d5c12 Remove hand-written store_tile() code 2024-06-12 02:54:32 +00:00
PoYen, Chen
a939ec5da4 Set invalid element value for LSEacc tensor view 2024-06-12 02:53:55 +00:00
PoYen, Chen
ff866f6bb6 Support providing invalid element for tensor view 2024-06-12 02:52:07 +00:00
PoYen, Chen
b994668714 Use tensor_descriptor to locate LSEacc elements 2024-06-12 02:32:33 +00:00
PoYen, Chen
ec82f3bbd6 Re-order pipeline call operator arguments 2024-06-11 19:54:30 +00:00
PoYen, Chen
9d1243e7fa Pass LSE/O strides in kernel argument 2024-06-11 19:45:21 +00:00
PoYen, Chen
df4fc8f26c Re-order split-kv pipeline call operator arguments 2024-06-11 19:23:19 +00:00
PoYen, Chen
6ee71c2bf6 Add stride kernel arguments for LSE/O acc workspace 2024-06-11 19:18:22 +00:00
PoYen, Chen
f968a7e442 Remove more debug code in combine pipeline 2024-06-11 18:36:23 +00:00
PoYen, Chen
4f8cef36bc Fix example output format 2024-06-11 18:21:31 +00:00
PoYen, Chen
5c752a02b7 Fix wrong pipeline args for fp8 2024-06-11 14:55:45 +00:00
PoYen, Chen
eaca81945e Remove unnessary tile size for fp8 2024-06-11 14:42:32 +00:00
PoYen, Chen
8eb6e451f2 Undo disabling data types 2024-06-11 14:37:18 +00:00
PoYen, Chen
2532908699 Print num_splits conditionally 2024-06-11 14:34:45 +00:00
PoYen, Chen
1c531a0c13 Update license date 2024-06-11 14:29:49 +00:00
PoYen, Chen
9293f5448a Enable non-split-kv blobs 2024-06-11 14:23:42 +00:00
PoYen, Chen
0fd7f85504 Use shorter template parameter name 2024-06-11 14:20:03 +00:00
PoYen, Chen
138b75bf12 Remove unused include directive 2024-06-11 14:18:24 +00:00
PoYen, Chen
16cc9eeef4 Fix unstable clang-format comment 2024-06-11 14:15:52 +00:00
PoYen, Chen
c9bbb7b142 Clearn up generate.py 2024-06-11 14:15:07 +00:00
PoYen, Chen
bb6804e315 Add constness to local variables 2024-06-11 14:10:35 +00:00
PoYen, Chen
31505a2a04 Remove more debug statements 2024-06-11 14:08:39 +00:00
PoYen, Chen
5efb80347e Remove debug statements in example 2024-06-11 14:02:53 +00:00
PoYen, Chen
912a6cb2ea Remove in-consistent comment 2024-06-11 13:56:44 +00:00
PoYen, Chen
95be5c2b9d Remove no-longer used field 2024-06-11 13:46:13 +00:00
PoYen, Chen
893841d745 Undo vector size changes 2024-06-11 13:46:13 +00:00
PoYen, Chen
40c885f007 Fix wrong loop counter step logic 2024-06-11 13:46:13 +00:00
PoYen, Chen
c36cad2e6c Fix wrong LDS indexing logics 2024-06-11 13:46:13 +00:00
PoYen, Chen
d74a1d6ed1 Fix split-kv combine kernel name 2024-06-11 13:46:13 +00:00
PoYen, Chen
f3e213c0c5 Reduce # of combine kernels 2024-06-11 13:46:13 +00:00
PoYen, Chen
180b726f97 Fix wrong kBlockSize used in policy 2024-06-11 13:46:13 +00:00
PoYen, Chen
238fde80a6 Fix o_acc memory error 2024-06-11 13:46:13 +00:00
PoYen, Chen
ffd2768000 Format codes 2024-06-11 13:46:13 +00:00
PoYen, Chen
18a7223b96 Fix wrong layout of LSE/LSEacc/Oacc 2024-06-11 13:46:13 +00:00
PoYen, Chen
064afc69d9 Replace sentinel value before storing 2024-06-11 13:46:13 +00:00
PoYen, Chen
5a6b8d8606 Clean-up code 2024-06-11 13:46:13 +00:00
Po-Yen, Chen
eac0f3cc47 Fix mismatched return type 2024-06-11 13:46:13 +00:00
PoYen, Chen
9ac2654b55 Add SplitKV combine kernel codegen logics 2024-06-11 13:46:13 +00:00
PoYen, Chen
cacce74f2c Add SplitKV kernel codegen logics 2024-06-11 13:46:13 +00:00
PoYen, Chen
78b64d11c4 Generate fmha_fwd_splitkv() 2024-06-11 13:46:13 +00:00
PoYen, Chen
c928fefaae Add num_splits option and dummy split-kv api method 2024-06-11 13:46:13 +00:00
Po Yen Chen
abc7e7ed30 Merge branch 'develop' into ck_tile/fa_train 2024-06-04 16:03:01 +08:00