PoYen, Chen
9d1243e7fa
Pass LSE/O strides in kernel argument
2024-06-11 19:45:21 +00:00
PoYen, Chen
6ee71c2bf6
Add stride kernel arguments for LSE/O acc workspace
2024-06-11 19:18:22 +00:00
PoYen, Chen
4f8cef36bc
Fix example output format
2024-06-11 18:21:31 +00:00
PoYen, Chen
5c752a02b7
Fix wrong pipeline args for fp8
2024-06-11 14:55:45 +00:00
PoYen, Chen
eaca81945e
Remove unnessary tile size for fp8
2024-06-11 14:42:32 +00:00
PoYen, Chen
8eb6e451f2
Undo disabling data types
2024-06-11 14:37:18 +00:00
PoYen, Chen
2532908699
Print num_splits conditionally
2024-06-11 14:34:45 +00:00
PoYen, Chen
9293f5448a
Enable non-split-kv blobs
2024-06-11 14:23:42 +00:00
PoYen, Chen
0fd7f85504
Use shorter template parameter name
2024-06-11 14:20:03 +00:00
PoYen, Chen
138b75bf12
Remove unused include directive
2024-06-11 14:18:24 +00:00
PoYen, Chen
c9bbb7b142
Clearn up generate.py
2024-06-11 14:15:07 +00:00
PoYen, Chen
31505a2a04
Remove more debug statements
2024-06-11 14:08:39 +00:00
PoYen, Chen
5efb80347e
Remove debug statements in example
2024-06-11 14:02:53 +00:00
PoYen, Chen
f3e213c0c5
Reduce # of combine kernels
2024-06-11 13:46:13 +00:00
PoYen, Chen
238fde80a6
Fix o_acc memory error
2024-06-11 13:46:13 +00:00
PoYen, Chen
18a7223b96
Fix wrong layout of LSE/LSEacc/Oacc
2024-06-11 13:46:13 +00:00
Po-Yen, Chen
eac0f3cc47
Fix mismatched return type
2024-06-11 13:46:13 +00:00
PoYen, Chen
9ac2654b55
Add SplitKV combine kernel codegen logics
2024-06-11 13:46:13 +00:00
PoYen, Chen
cacce74f2c
Add SplitKV kernel codegen logics
2024-06-11 13:46:13 +00:00
PoYen, Chen
78b64d11c4
Generate fmha_fwd_splitkv()
2024-06-11 13:46:13 +00:00
PoYen, Chen
c928fefaae
Add num_splits option and dummy split-kv api method
2024-06-11 13:46:13 +00:00
Po Yen Chen
abc7e7ed30
Merge branch 'develop' into ck_tile/fa_train
2024-06-04 16:03:01 +08:00
rocking
9ceff3a5c8
Generate the instance for FA required
2024-06-03 20:03:16 +00:00
zjing14
6fb1f4e03f
Post-merge fix of PR 1300 ( #1313 )
...
* add f8 gemm with multiD for both row/col wise
* change compute_type to fp8
* changed tuning parameters in the example
* add rcr example
* post-merge fix
* fix
* reduce init range
2024-05-31 22:46:41 -07:00
Po Yen Chen
ff31c6a70c
Merge branch 'develop' into ck_tile/fa_train
2024-05-31 15:52:47 +08:00
danyao12
87f73f30e8
Transpose -> transpose
2024-05-29 16:54:26 +08:00
zjing14
80db62f08d
add f8 gemm multiD with both row/col wise scale ( #1300 )
...
* add f8 gemm with multiD for both row/col wise
* change compute_type to fp8
* changed tuning parameters in the example
* add rcr example
2024-05-28 12:04:22 -05:00
danyao12
1c511b3e7d
update bwd kernel launch
2024-05-28 23:14:18 +08:00
danyao12
ba6437868b
Merge branch 'develop' into ck_tile/fa_train
2024-05-28 11:42:38 +08:00
carlushuang
5055b3bdcb
[CK_TILE] support group from cmdline ( #1295 )
...
* support cmdline seqlen decode
* silent print
* update readme
* update kernel launch 3d
* update tile partitioner
* fix spill for bf16
* modify based on comment
* modify payload_t
* fix bug for alibi mode
* fix alibi test err
* refactor kernel launch, support select timer
* add missing file
* remove useless code
* add some comments
2024-05-28 11:13:21 +08:00
danyao12
7ed2ca79ac
update generated filenames
2024-05-23 17:20:10 +08:00
danyao12
ff6f33d4f7
add bwd validation stream_config
2024-05-23 15:18:43 +08:00
Illia Silin
7b027d5643
Select appropriate GPU targets for instances, tests, and examples. ( #1304 )
...
* set individual gpu targets for instances, examples, tests
* fix path to hip compiler
* fix path to hip compiler once more
* aggregate device macros in ck_tile config header
* fix the cmake logic for instances
* fix clang format
* add gfx900 and gfx906 to default set of targets
2024-05-22 11:45:27 -07:00
danyao12
826a894335
support bwd alibi
2024-05-15 21:55:02 +08:00
danyao12
a84009f83b
bwd alibi
2024-05-13 10:39:44 +08:00
carlushuang
35f59c04e6
Merge remote-tracking branch 'origin/develop' into ck_tile/fa_train
2024-05-12 23:03:10 +00:00
carlushuang
bd9cd53885
now fwd/bwd can build
2024-05-12 22:33:22 +00:00
carlushuang
90700dbefa
[CK_TILE] support alibi ( #1269 )
...
* add alibi support
* fix code
* update code based on comment
* Support more hdim
* fix fp8 bias
* support seqlen_k=0 case
* remove unused printf
* fix format
---------
Co-authored-by: rocking <ChunYu.Lai@amd.com >
2024-05-11 10:43:56 +00:00
Illia Silin
7843a8a7fb
re-enable convnd_fwd_xdl_fp64 testing ( #1289 )
2024-05-10 22:48:28 -07:00
Illia Silin
566b6480a2
Code clean-up ( #1285 )
...
* code clean-up
* remove the profiling output samples
2024-05-10 09:41:39 -07:00
carlushuang
fcba889ef4
[CK_TILE] fix some rand number init ( #1287 )
...
* add random norm
* normalized default to 0/3
* change squant->auto
2024-05-10 09:03:39 -07:00
danyao12
c26c99e55f
CMakeLists update
2024-05-10 12:09:33 +08:00
danyao12
15187df456
epilogue reuse
2024-05-10 10:57:53 +08:00
Adam Osewski
a0ae1c6133
Fix MakeArgument ( #1284 )
2024-05-09 09:42:41 -07:00
danyao12
e1a21655ae
FA bwd
2024-05-09 17:08:08 +08:00
carlushuang
851c3ed157
[CK_TILE] support alibi ( #1269 )
...
* add alibi support
* fix code
* update code based on comment
* Support more hdim
* fix fp8 bias
* support seqlen_k=0 case
* remove unused printf
* fix format
---------
Co-authored-by: rocking <ChunYu.Lai@amd.com >
2024-05-07 22:32:54 +08:00
Adam Osewski
0f7e8ec485
Fix example CMakeLists.txt ( #1267 )
...
Add proper dependency target.
2024-04-30 08:28:19 -07:00
danyao12
bbd2e1eae3
FA fwd dropout
2024-04-29 14:13:00 +08:00
Haocong WANG
764164b488
[GEMM] UniversalGemm update ( #1262 )
...
* Add bf16 instances
* Add bf16 gemm universal example
* tempsave
* Add guard to navi compilation
* workground on a specific mixed gemm instance ( bring back it when compiler fix upload)
* fix formatting condition statement issue
* solve conflict
---------
Co-authored-by: Jun Liu <Liu.Jun@amd.com >
2024-04-26 12:56:07 -05:00
zjing14
0d0150db20
bf16A_Int8B with fastgelu/bias ( #1264 )
...
* changed the copy function to v7r2
* adding multi_abd
* in-progress
* add post-load oob check
* debugging
* adjust instances
* add run_lds
* add elemntwise_op
* replace multi_abd_device with v3
* clean up
* clean
* clean
* Added LDSType
* profiling
* adjust oobcheck
* add missing file
* refactor
* clean
* add examples
2024-04-26 07:26:30 -05:00