PoYen, Chen
381f7e90e0
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-08-04 02:12:20 +00:00
PoYen, Chen
baf4a612f0
Fix wrong kernel name
2024-08-02 10:26:47 +00:00
PoYen, Chen
e7969b9fd2
Add template argument 'kIsPagedKV' for splitkv kernels
2024-08-02 10:14:51 +00:00
carlushuang
b3f86e79dd
workaround rocm-6.2 compiler issue ( #1421 )
2024-07-31 16:03:59 +08:00
PoYen, Chen
c1c50ee498
Enlarge KPerThread for rotary_interleaved=false
2024-07-26 07:09:53 +00:00
PoYen, Chen
bd28e96425
Remove no-longer used method in pipeline
2024-07-24 06:59:45 +00:00
PoYen, Chen
5c733dc568
Remove debug statements
2024-07-24 06:47:52 +00:00
PoYen, Chen
d84c915549
Disable host verification if API not exist
2024-07-24 06:02:41 +00:00
PoYen, Chen
59e1d9b84f
Shift rotary_cos/rotary_sin by cache_seqlen_k
2024-07-24 05:06:47 +00:00
PoYen, Chen
3348131699
Fix wrong data type for Q rotary_cos/rotary_sin
2024-07-24 04:10:43 +00:00
PoYen, Chen
5ea60715ea
Update host/device specifiers
2024-07-24 03:45:19 +00:00
PoYen, Chen
6f95239229
Use different rotary_cos/rotary_sin distr for Q/Knew
2024-07-24 03:40:29 +00:00
PoYen, Chen
47a74f282d
Extract Q/Knew vector size to helper methods
2024-07-24 03:23:18 +00:00
PoYen, Chen
eb4ea3ac2a
Fix wrong rotary_cos/rotary_sin memory size for Q
2024-07-23 16:22:25 +00:00
PoYen, Chen
b11f92dc4c
Fix wrong shape of knew_host/vnew_host
2024-07-23 14:52:42 +00:00
PoYen, Chen
ca4b208b60
Fix wrong grid size
2024-07-23 14:20:52 +00:00
PoYen, Chen
52b47810bb
Rename more tile size constants
2024-07-23 09:30:05 +00:00
PoYen, Chen
99c1d463de
Align naming of some tile size constants
2024-07-23 09:24:38 +00:00
PoYen, Chen
ce5e0f1d67
Re-order parameters
2024-07-23 09:02:41 +00:00
PoYen, Chen
fb80c7b2cb
Extract rotary embedding logic out
2024-07-23 08:51:59 +00:00
PoYen, Chen
2192bbc68a
Rename RotaryEmbeddingEnum
2024-07-23 07:50:50 +00:00
PoYen, Chen
d4606cf3c3
Rename header
2024-07-23 07:45:25 +00:00
PoYen, Chen
b275732128
Remove always true static_assert()
2024-07-23 07:25:50 +00:00
PoYen, Chen
eb649a2f25
Move thread locating logics into policy
2024-07-23 07:21:20 +00:00
PoYen, Chen
0e5cb6f913
Skip code if # of block is more than needed
2024-07-23 06:53:24 +00:00
PoYen, Chen
0925c0e941
Use better naming for tile indices
2024-07-23 06:40:53 +00:00
PoYen, Chen
bc7c7ee0c5
Fix wrong knew/vnew appending positions
2024-07-23 04:46:53 +00:00
PoYen, Chen
56df4d6397
Remove debug print code in kernel
2024-07-23 04:01:55 +00:00
PoYen, Chen
48c70720b5
Apply RoPE to q_tile
2024-07-23 03:54:11 +00:00
PoYen, Chen
e88253a2f4
Add code blocks for q_tile
2024-07-23 03:28:40 +00:00
PoYen, Chen
1dbed18555
Remove constness from q_ptr
2024-07-23 03:11:31 +00:00
PoYen, Chen
c26c60db4c
Unify parameter/variable naming style
2024-07-23 02:59:17 +00:00
PoYen, Chen
c0bc097758
Apply elementwise function to the loaded tiles
2024-07-23 02:50:07 +00:00
PoYen, Chen
df352f955a
Add comment
2024-07-23 02:31:45 +00:00
PoYen, Chen
d1ecfdc700
Support 8x rotary_dim under half-rotated RoPE
2024-07-23 02:19:16 +00:00
PoYen, Chen
631f29d527
Handle RoPE half-rotated logics
2024-07-22 08:50:03 +00:00
PoYen, Chen
01865d2ae4
Clean-up pipeline
2024-07-22 03:14:10 +00:00
PoYen, Chen
fffd6799e6
Instantiate multiple kernels for RoPE approaches
2024-07-20 02:28:21 +00:00
PoYen, Chen
27b5141706
Fix wrong thread starting offset
2024-07-18 20:02:06 +00:00
PoYen, Chen
23450526c0
Only apply interleaved RoPE on Knew for now
2024-07-18 19:42:14 +00:00
PoYen, Chen
85bfed07fa
Add dram distribution for rotary_cos/rotary_sin (interleaved)
2024-07-18 09:11:22 +00:00
PoYen, Chen
99f863e4cd
Fix rotary cos/sin tensor/tile size
2024-07-16 06:31:17 +00:00
PoYen, Chen
b32fd8d3f4
Rename variables used in distributio encoding
2024-07-16 06:27:28 +00:00
PoYen, Chen
879710a495
Fix wrong seqlen_k for kvcache
2024-07-16 03:42:51 +00:00
PoYen, Chen
b0925bb7f6
Create Rotary Cos/Sin tile windows in kernel
2024-07-14 23:47:40 +00:00
PoYen, Chen
391210ed9e
Pass RoPE kernel args
2024-07-14 23:18:32 +00:00
PoYen, Chen
b34ddf5f71
Merge remote-tracking branch 'origin/feature/cond-add-splitkv' into feature/fmha-fwd-appendkv
2024-07-12 04:42:45 +00:00
PoYen, Chen
8c733fb3be
Fix compilation errors
2024-07-10 10:53:58 +00:00
carlushuang
8182976c37
[CK_TILE] wa prec, remove sgpr offset for inline asm ( #1356 )
...
* wa prec, remove sgpr offset for inline asm
* macro for set tile
* ignore unused param if no kernel instances in host API
* fix more prec issue
* cache buffer resource
* fix
* support pre-nop
* clear tile by vector type members
* add workaround to reduce scratch memory
* conditionally enable workaround code
* enable workaround start from certain build version
* fallback set_tile() implementation from certain build version
* undo template argument changes
* put dummy asm in load_raw()
* fix comments, refactor s_nop inside buffer_load
---------
Co-authored-by: PoYen, Chen <PoYen.Chen@amd.com >
2024-07-08 11:09:55 -07:00
PoYen, Chen
1c070380fa
Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv
2024-07-08 07:13:34 +00:00