PoYen, Chen
|
43b8100b7f
|
Support cache_batch_idx in example
|
2024-08-16 16:27:56 +00:00 |
|
PoYen, Chen
|
9c904b0e4c
|
Pass cache_batch_idx to kernels
|
2024-08-16 15:32:24 +00:00 |
|
PoYen, Chen
|
2523c8e36c
|
Fix more format
|
2024-08-16 10:32:17 +00:00 |
|
PoYen, Chen
|
5805f5aa73
|
Remove group mode from appendkv kernel
|
2024-08-16 10:04:48 +00:00 |
|
PoYen, Chen
|
9de0f35ebc
|
Remove unused template paremeter
|
2024-08-13 09:29:20 +00:00 |
|
PoYen, Chen
|
370babc996
|
Make tile window directly via PageBlockNavigator
|
2024-08-13 09:18:24 +00:00 |
|
PoYen, Chen
|
3dd6ef61ef
|
Re-order pipeline paremeters
|
2024-08-13 07:38:41 +00:00 |
|
PoYen, Chen
|
c54de6416a
|
Rename TileWindowNavigator to PageBlockNavigator
|
2024-08-13 07:23:40 +00:00 |
|
PoYen, Chen
|
a0d2163045
|
Remove dropout code in splitkv kernel
|
2024-08-08 10:21:34 +00:00 |
|
PoYen, Chen
|
cef9da0a76
|
Remove debug macro usages
|
2024-08-07 15:26:43 +00:00 |
|
PoYen, Chen
|
f265742b63
|
Handle cross-page-block write
|
2024-08-07 09:33:41 +00:00 |
|
PoYen, Chen
|
26ed468ac6
|
Pass re-created tile window to pipeline
|
2024-08-07 06:00:17 +00:00 |
|
PoYen, Chen
|
78209c7326
|
Fix wrong tensor descriptor lengths
|
2024-08-07 05:59:26 +00:00 |
|
PoYen, Chen
|
7789b53e15
|
Add tile navigators to the appendkv kernel
|
2024-08-07 04:51:21 +00:00 |
|
PoYen, Chen
|
443a528adc
|
Add block_table kernel args for appendkv kernel
|
2024-08-07 04:27:15 +00:00 |
|
PoYen, Chen
|
15d0034a64
|
Add paged-kv codegen logic for appendkv kernels
|
2024-08-07 04:19:45 +00:00 |
|
PoYen, Chen
|
bd0d2f3975
|
Add batch_stride_k/batch_stride_v in group mode
|
2024-08-06 08:02:43 +00:00 |
|
PoYen, Chen
|
4fed268723
|
Move code after decide seqlen_q/seqlen_k
|
2024-08-06 01:39:49 +00:00 |
|
PoYen, Chen
|
77dac7775c
|
Move V tile through TileWindowNavigator
|
2024-08-05 22:36:52 +00:00 |
|
PoYen, Chen
|
bb78353264
|
Remove ununnecessary data members
|
2024-08-05 21:52:59 +00:00 |
|
PoYen, Chen
|
8fea4139df
|
Fix tile window navigation bugs
|
2024-08-05 21:34:15 +00:00 |
|
PoYen, Chen
|
ecaaa6f136
|
Simplify TileWindowNavigator interfaces
|
2024-08-05 16:31:31 +00:00 |
|
PoYen, Chen
|
1c9d77b606
|
Introduce 'TileWindowNavigator' types
|
2024-08-05 15:58:41 +00:00 |
|
PoYen, Chen
|
baf4a612f0
|
Fix wrong kernel name
|
2024-08-02 10:26:47 +00:00 |
|
PoYen, Chen
|
e7969b9fd2
|
Add template argument 'kIsPagedKV' for splitkv kernels
|
2024-08-02 10:14:51 +00:00 |
|
PoYen, Chen
|
5c733dc568
|
Remove debug statements
|
2024-07-24 06:47:52 +00:00 |
|
PoYen, Chen
|
59e1d9b84f
|
Shift rotary_cos/rotary_sin by cache_seqlen_k
|
2024-07-24 05:06:47 +00:00 |
|
PoYen, Chen
|
3348131699
|
Fix wrong data type for Q rotary_cos/rotary_sin
|
2024-07-24 04:10:43 +00:00 |
|
PoYen, Chen
|
eb4ea3ac2a
|
Fix wrong rotary_cos/rotary_sin memory size for Q
|
2024-07-23 16:22:25 +00:00 |
|
PoYen, Chen
|
b11f92dc4c
|
Fix wrong shape of knew_host/vnew_host
|
2024-07-23 14:52:42 +00:00 |
|
PoYen, Chen
|
ca4b208b60
|
Fix wrong grid size
|
2024-07-23 14:20:52 +00:00 |
|
PoYen, Chen
|
52b47810bb
|
Rename more tile size constants
|
2024-07-23 09:30:05 +00:00 |
|
PoYen, Chen
|
99c1d463de
|
Align naming of some tile size constants
|
2024-07-23 09:24:38 +00:00 |
|
PoYen, Chen
|
ce5e0f1d67
|
Re-order parameters
|
2024-07-23 09:02:41 +00:00 |
|
PoYen, Chen
|
2192bbc68a
|
Rename RotaryEmbeddingEnum
|
2024-07-23 07:50:50 +00:00 |
|
PoYen, Chen
|
0925c0e941
|
Use better naming for tile indices
|
2024-07-23 06:40:53 +00:00 |
|
PoYen, Chen
|
bc7c7ee0c5
|
Fix wrong knew/vnew appending positions
|
2024-07-23 04:46:53 +00:00 |
|
PoYen, Chen
|
56df4d6397
|
Remove debug print code in kernel
|
2024-07-23 04:01:55 +00:00 |
|
PoYen, Chen
|
48c70720b5
|
Apply RoPE to q_tile
|
2024-07-23 03:54:11 +00:00 |
|
PoYen, Chen
|
1dbed18555
|
Remove constness from q_ptr
|
2024-07-23 03:11:31 +00:00 |
|
PoYen, Chen
|
fffd6799e6
|
Instantiate multiple kernels for RoPE approaches
|
2024-07-20 02:28:21 +00:00 |
|
PoYen, Chen
|
99f863e4cd
|
Fix rotary cos/sin tensor/tile size
|
2024-07-16 06:31:17 +00:00 |
|
PoYen, Chen
|
879710a495
|
Fix wrong seqlen_k for kvcache
|
2024-07-16 03:42:51 +00:00 |
|
PoYen, Chen
|
b0925bb7f6
|
Create Rotary Cos/Sin tile windows in kernel
|
2024-07-14 23:47:40 +00:00 |
|
PoYen, Chen
|
391210ed9e
|
Pass RoPE kernel args
|
2024-07-14 23:18:32 +00:00 |
|
PoYen, Chen
|
8c733fb3be
|
Fix compilation errors
|
2024-07-10 10:53:58 +00:00 |
|
PoYen, Chen
|
1c070380fa
|
Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv
|
2024-07-08 07:13:34 +00:00 |
|
PoYen, Chen
|
8fb567c286
|
Fix vnew append errro
|
2024-06-26 17:00:07 +00:00 |
|
Po Yen Chen
|
0cb2e06ddc
|
[CK_TILE] fmha forward split-kv + combine kernels (#1338)
* FA fwd dropout
* FA bwd
* epilogue reuse
* CMakeLists update
* [CK_TILE] support alibi (#1269)
* add alibi support
* fix code
* update code based on comment
* Support more hdim
* fix fp8 bias
* support seqlen_k=0 case
* remove unused printf
* fix format
---------
Co-authored-by: rocking <ChunYu.Lai@amd.com>
* now fwd/bwd can build
* bwd alibi
* add bwd validation stream_config
* update generated filenames
* update bwd kernel launch
* CK_TILE_HOST_DEVICE in philox
* Transpose -> transpose
* format
* format
* format
* Generate the instance for FA required
* format
* fix error in WarpGemm
* Add num_splits option and dummy split-kv api method
* Generate fmha_fwd_splitkv()
* Add SplitKV kernel codegen logics
* Add SplitKV combine kernel codegen logics
* Fix mismatched return type
* Clean-up code
* Replace sentinel value before storing
* Fix wrong layout of LSE/LSEacc/Oacc
* Format codes
* Fix o_acc memory error
* Fix wrong kBlockSize used in policy
* Reduce # of combine kernels
* Fix split-kv combine kernel name
* Fix wrong LDS indexing logics
* Fix wrong loop counter step logic
* Undo vector size changes
* Remove no-longer used field
* Remove in-consistent comment
* Remove debug statements in example
* Remove more debug statements
* Add constness to local variables
* Clearn up generate.py
* Fix unstable clang-format comment
* Remove unused include directive
* Use shorter template parameter name
* Enable non-split-kv blobs
* Update license date
* Print num_splits conditionally
* Undo disabling data types
* Remove unnessary tile size for fp8
* Fix wrong pipeline args for fp8
* Fix example output format
* Remove more debug code in combine pipeline
* Add stride kernel arguments for LSE/O acc workspace
* Re-order split-kv pipeline call operator arguments
* Pass LSE/O strides in kernel argument
* Re-order pipeline call operator arguments
* Use tensor_descriptor to locate LSEacc elements
* Support providing invalid element for tensor view
* Set invalid element value for LSEacc tensor view
* Remove hand-written store_tile() code
* Remove necessary value-overwrite logic
* Add transposed lds descriptor
* Support load_tile() for tile_window_with_static_lengths<>
* Undo removing necessary value-overwrite logic
* Use read descriptor to locate lds elements
* Simplify pipeline source code
* Add constraint to kMaxSplits
* Default use kMaxSplits=64 in generate.py
* Revert "Add constraint to kMaxSplits"
This reverts commit 0a2132d758.
* Revert "Default use kMaxSplits=64 in generate.py"
This reverts commit c7d9c80b77.
* Decide alignment by the padding parameter
* Remove no-longer used utility functions
* Remove not-working code
* Add comment & remove no-longer used code
* Fix computation errors
* Add heuristic to override num_splits option
* Add constraint to kMaxSplits
* Fix compilation error
* Clean up pipeline code
* Wrap pointer access as lambda function
* Rename confusing methods
* Use kLogMasSplits as template parameter
* Finish splitkv combine kernel codegen
* Update kMaxSplits limit
* Use smaller kM0 for splitkv combine kernel
* Ignore droupout flag in splitkv pipeline
* Unify flag usage
* Add back flag kStoreLSE
* Merge lambda calls in pipeline
* Fix compilation errors
* Avoid all empty splits
* Always check for empty loop in splitkv pipelines
* Re-order parameters
* Remove redundant p_drop option check
* Add traits/problem for fwd splitkv kernel
* Conditionally enable uneven split boundary checks
* Add comment for the splitkv traits field
* Change even split criteria
* Re-order statements
* Refine occupancy value for hdim=128&256
* Refine occupancy value for hdim=32&64
* Remove redundant kernel argument
* Separate fmha bwd codegen logics
* Separate fmha fwd codegen logics
* Remove redundant direction parameter in fwd&bwd codegen logics
* Support generate multiple APIs for an example
* Let 'api' an alias of 'direction' option
* Remove choices for the 'direction' option
* Use dictionary to config all the functions
* Move fmha splitkv codegen logics to other file
* Add fwd_splitkv api for tile_example_fmha_fwd
---------
Co-authored-by: danyao12 <danyao12>
Co-authored-by: carlushuang <carlus.huang@amd.com>
Co-authored-by: rocking <ChunYu.Lai@amd.com>
Co-authored-by: Jing Zhang <jizhan@amd.com>
|
2024-06-26 17:41:15 +08:00 |
|
PoYen, Chen
|
4e6c28522c
|
Fix wrong K values after appending
|
2024-06-25 10:12:13 +00:00 |
|