Commit Graph

350 Commits

Author SHA1 Message Date
PoYen, Chen
9d1243e7fa Pass LSE/O strides in kernel argument 2024-06-11 19:45:21 +00:00
PoYen, Chen
6ee71c2bf6 Add stride kernel arguments for LSE/O acc workspace 2024-06-11 19:18:22 +00:00
PoYen, Chen
4f8cef36bc Fix example output format 2024-06-11 18:21:31 +00:00
PoYen, Chen
5c752a02b7 Fix wrong pipeline args for fp8 2024-06-11 14:55:45 +00:00
PoYen, Chen
eaca81945e Remove unnessary tile size for fp8 2024-06-11 14:42:32 +00:00
PoYen, Chen
8eb6e451f2 Undo disabling data types 2024-06-11 14:37:18 +00:00
PoYen, Chen
2532908699 Print num_splits conditionally 2024-06-11 14:34:45 +00:00
PoYen, Chen
9293f5448a Enable non-split-kv blobs 2024-06-11 14:23:42 +00:00
PoYen, Chen
0fd7f85504 Use shorter template parameter name 2024-06-11 14:20:03 +00:00
PoYen, Chen
138b75bf12 Remove unused include directive 2024-06-11 14:18:24 +00:00
PoYen, Chen
c9bbb7b142 Clearn up generate.py 2024-06-11 14:15:07 +00:00
PoYen, Chen
31505a2a04 Remove more debug statements 2024-06-11 14:08:39 +00:00
PoYen, Chen
5efb80347e Remove debug statements in example 2024-06-11 14:02:53 +00:00
PoYen, Chen
f3e213c0c5 Reduce # of combine kernels 2024-06-11 13:46:13 +00:00
PoYen, Chen
238fde80a6 Fix o_acc memory error 2024-06-11 13:46:13 +00:00
PoYen, Chen
18a7223b96 Fix wrong layout of LSE/LSEacc/Oacc 2024-06-11 13:46:13 +00:00
Po-Yen, Chen
eac0f3cc47 Fix mismatched return type 2024-06-11 13:46:13 +00:00
PoYen, Chen
9ac2654b55 Add SplitKV combine kernel codegen logics 2024-06-11 13:46:13 +00:00
PoYen, Chen
cacce74f2c Add SplitKV kernel codegen logics 2024-06-11 13:46:13 +00:00
PoYen, Chen
78b64d11c4 Generate fmha_fwd_splitkv() 2024-06-11 13:46:13 +00:00
PoYen, Chen
c928fefaae Add num_splits option and dummy split-kv api method 2024-06-11 13:46:13 +00:00
Po Yen Chen
abc7e7ed30 Merge branch 'develop' into ck_tile/fa_train 2024-06-04 16:03:01 +08:00
rocking
9ceff3a5c8 Generate the instance for FA required 2024-06-03 20:03:16 +00:00
zjing14
6fb1f4e03f Post-merge fix of PR 1300 (#1313)
* add f8 gemm with multiD for both row/col wise

* change compute_type to fp8

* changed tuning parameters in the example

* add rcr example

* post-merge fix

* fix

* reduce init range
2024-05-31 22:46:41 -07:00
Po Yen Chen
ff31c6a70c Merge branch 'develop' into ck_tile/fa_train 2024-05-31 15:52:47 +08:00
danyao12
87f73f30e8 Transpose -> transpose 2024-05-29 16:54:26 +08:00
zjing14
80db62f08d add f8 gemm multiD with both row/col wise scale (#1300)
* add f8 gemm with multiD for both row/col wise

* change compute_type to fp8

* changed tuning parameters in the example

* add rcr example
2024-05-28 12:04:22 -05:00
danyao12
1c511b3e7d update bwd kernel launch 2024-05-28 23:14:18 +08:00
danyao12
ba6437868b Merge branch 'develop' into ck_tile/fa_train 2024-05-28 11:42:38 +08:00
carlushuang
5055b3bdcb [CK_TILE] support group from cmdline (#1295)
* support cmdline seqlen decode

* silent print

* update readme

* update kernel launch 3d

* update tile partitioner

* fix spill for bf16

* modify based on comment

* modify payload_t

* fix bug for alibi mode

* fix alibi test err

* refactor kernel launch, support select timer

* add missing file

* remove useless code

* add some comments
2024-05-28 11:13:21 +08:00
danyao12
7ed2ca79ac update generated filenames 2024-05-23 17:20:10 +08:00
danyao12
ff6f33d4f7 add bwd validation stream_config 2024-05-23 15:18:43 +08:00
Illia Silin
7b027d5643 Select appropriate GPU targets for instances, tests, and examples. (#1304)
* set individual gpu targets for instances, examples, tests

* fix path to hip compiler

* fix path to hip compiler once more

* aggregate device macros in ck_tile config header

* fix the cmake logic for instances

* fix clang format

* add gfx900 and gfx906 to default set of targets
2024-05-22 11:45:27 -07:00
danyao12
826a894335 support bwd alibi 2024-05-15 21:55:02 +08:00
danyao12
a84009f83b bwd alibi 2024-05-13 10:39:44 +08:00
carlushuang
35f59c04e6 Merge remote-tracking branch 'origin/develop' into ck_tile/fa_train 2024-05-12 23:03:10 +00:00
carlushuang
bd9cd53885 now fwd/bwd can build 2024-05-12 22:33:22 +00:00
carlushuang
90700dbefa [CK_TILE] support alibi (#1269)
* add alibi support

* fix code

* update code based on comment

* Support more hdim

* fix fp8 bias

* support seqlen_k=0 case

* remove unused printf

* fix format

---------

Co-authored-by: rocking <ChunYu.Lai@amd.com>
2024-05-11 10:43:56 +00:00
Illia Silin
7843a8a7fb re-enable convnd_fwd_xdl_fp64 testing (#1289) 2024-05-10 22:48:28 -07:00
Illia Silin
566b6480a2 Code clean-up (#1285)
* code clean-up

* remove the profiling output samples
2024-05-10 09:41:39 -07:00
carlushuang
fcba889ef4 [CK_TILE] fix some rand number init (#1287)
* add random norm

* normalized default to 0/3

* change squant->auto
2024-05-10 09:03:39 -07:00
danyao12
c26c99e55f CMakeLists update 2024-05-10 12:09:33 +08:00
danyao12
15187df456 epilogue reuse 2024-05-10 10:57:53 +08:00
Adam Osewski
a0ae1c6133 Fix MakeArgument (#1284) 2024-05-09 09:42:41 -07:00
danyao12
e1a21655ae FA bwd 2024-05-09 17:08:08 +08:00
carlushuang
851c3ed157 [CK_TILE] support alibi (#1269)
* add alibi support

* fix code

* update code based on comment

* Support more hdim

* fix fp8 bias

* support seqlen_k=0 case

* remove unused printf

* fix format

---------

Co-authored-by: rocking <ChunYu.Lai@amd.com>
2024-05-07 22:32:54 +08:00
Adam Osewski
0f7e8ec485 Fix example CMakeLists.txt (#1267)
Add proper dependency target.
2024-04-30 08:28:19 -07:00
danyao12
bbd2e1eae3 FA fwd dropout 2024-04-29 14:13:00 +08:00
Haocong WANG
764164b488 [GEMM] UniversalGemm update (#1262)
* Add bf16 instances

* Add bf16 gemm universal example

* tempsave

* Add guard to navi compilation

* workground on a specific mixed gemm instance ( bring back it when compiler fix upload)

* fix formatting condition statement issue

* solve conflict

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>
2024-04-26 12:56:07 -05:00
zjing14
0d0150db20 bf16A_Int8B with fastgelu/bias (#1264)
* changed the copy function to v7r2

* adding multi_abd

* in-progress

* add post-load oob check

* debugging

* adjust instances

* add run_lds

* add elemntwise_op

* replace multi_abd_device with v3

* clean up

* clean

* clean

* Added LDSType

* profiling

* adjust oobcheck

* add missing file

* refactor

* clean

* add examples
2024-04-26 07:26:30 -05:00