PoYen, Chen
|
2192bbc68a
|
Rename RotaryEmbeddingEnum
|
2024-07-23 07:50:50 +00:00 |
|
PoYen, Chen
|
d4606cf3c3
|
Rename header
|
2024-07-23 07:45:25 +00:00 |
|
PoYen, Chen
|
b275732128
|
Remove always true static_assert()
|
2024-07-23 07:25:50 +00:00 |
|
PoYen, Chen
|
eb649a2f25
|
Move thread locating logics into policy
|
2024-07-23 07:21:20 +00:00 |
|
PoYen, Chen
|
0e5cb6f913
|
Skip code if # of block is more than needed
|
2024-07-23 06:53:24 +00:00 |
|
PoYen, Chen
|
7124f3eda5
|
Add make_tile_window() for adding distribution only
|
2024-07-23 06:52:38 +00:00 |
|
PoYen, Chen
|
0925c0e941
|
Use better naming for tile indices
|
2024-07-23 06:40:53 +00:00 |
|
PoYen, Chen
|
bc7c7ee0c5
|
Fix wrong knew/vnew appending positions
|
2024-07-23 04:46:53 +00:00 |
|
PoYen, Chen
|
56df4d6397
|
Remove debug print code in kernel
|
2024-07-23 04:01:55 +00:00 |
|
PoYen, Chen
|
48c70720b5
|
Apply RoPE to q_tile
|
2024-07-23 03:54:11 +00:00 |
|
PoYen, Chen
|
e88253a2f4
|
Add code blocks for q_tile
|
2024-07-23 03:28:40 +00:00 |
|
PoYen, Chen
|
1dbed18555
|
Remove constness from q_ptr
|
2024-07-23 03:11:31 +00:00 |
|
PoYen, Chen
|
c26c60db4c
|
Unify parameter/variable naming style
|
2024-07-23 02:59:17 +00:00 |
|
PoYen, Chen
|
c0bc097758
|
Apply elementwise function to the loaded tiles
|
2024-07-23 02:50:07 +00:00 |
|
PoYen, Chen
|
df352f955a
|
Add comment
|
2024-07-23 02:31:45 +00:00 |
|
PoYen, Chen
|
d1ecfdc700
|
Support 8x rotary_dim under half-rotated RoPE
|
2024-07-23 02:19:16 +00:00 |
|
PoYen, Chen
|
631f29d527
|
Handle RoPE half-rotated logics
|
2024-07-22 08:50:03 +00:00 |
|
PoYen, Chen
|
1136e6b560
|
Fix error in RoPE host reference
|
2024-07-22 08:39:09 +00:00 |
|
PoYen, Chen
|
01865d2ae4
|
Clean-up pipeline
|
2024-07-22 03:14:10 +00:00 |
|
PoYen, Chen
|
fffd6799e6
|
Instantiate multiple kernels for RoPE approaches
|
2024-07-20 02:28:21 +00:00 |
|
PoYen, Chen
|
27b5141706
|
Fix wrong thread starting offset
|
2024-07-18 20:02:06 +00:00 |
|
PoYen, Chen
|
23450526c0
|
Only apply interleaved RoPE on Knew for now
|
2024-07-18 19:42:14 +00:00 |
|
PoYen, Chen
|
85bfed07fa
|
Add dram distribution for rotary_cos/rotary_sin (interleaved)
|
2024-07-18 09:11:22 +00:00 |
|
PoYen, Chen
|
39ef09bb23
|
Remove unused inner namespace
|
2024-07-18 09:10:51 +00:00 |
|
PoYen, Chen
|
e83c3c7fa0
|
Add constraint to the rotary_dim option
|
2024-07-16 06:54:37 +00:00 |
|
PoYen, Chen
|
99f863e4cd
|
Fix rotary cos/sin tensor/tile size
|
2024-07-16 06:31:17 +00:00 |
|
PoYen, Chen
|
b32fd8d3f4
|
Rename variables used in distributio encoding
|
2024-07-16 06:27:28 +00:00 |
|
PoYen, Chen
|
879710a495
|
Fix wrong seqlen_k for kvcache
|
2024-07-16 03:42:51 +00:00 |
|
PoYen, Chen
|
65dac9fb90
|
Fix wrong boundaries
|
2024-07-15 01:42:53 +00:00 |
|
PoYen, Chen
|
4e01307e04
|
Fix compilation error in debug mode
|
2024-07-15 01:26:46 +00:00 |
|
PoYen, Chen
|
1a093f94b2
|
Add minimum seqlen_k to generate compliance kvcache
|
2024-07-15 01:11:16 +00:00 |
|
PoYen, Chen
|
57c6a4125c
|
Fix seqlen_knew enabling check logic
|
2024-07-15 00:40:39 +00:00 |
|
PoYen, Chen
|
ad61d9d4b2
|
Randomly generate seqlen_knew if needed
|
2024-07-15 00:39:03 +00:00 |
|
PoYen, Chen
|
f6850aef29
|
Add compute data type alias for RoPE
|
2024-07-15 00:05:33 +00:00 |
|
PoYen, Chen
|
b0925bb7f6
|
Create Rotary Cos/Sin tile windows in kernel
|
2024-07-14 23:47:40 +00:00 |
|
PoYen, Chen
|
391210ed9e
|
Pass RoPE kernel args
|
2024-07-14 23:18:32 +00:00 |
|
PoYen, Chen
|
b5ad1411b0
|
Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv
|
2024-07-14 22:13:17 +00:00 |
|
PoYen, Chen
|
c6717bb300
|
Merge branch 'feature/cond-add-splitkv' of github.com:ROCm/composable_kernel into feature/cond-add-splitkv
|
2024-07-14 22:11:39 +00:00 |
|
PoYen, Chen
|
8c1647d778
|
Avoid invoking deprecated method 'find_module'
|
2024-07-14 22:10:30 +00:00 |
|
Po Yen Chen
|
5ce0fecf36
|
Merge branch 'develop' into feature/cond-add-splitkv
|
2024-07-15 05:48:51 +08:00 |
|
PoYen, Chen
|
55f55025ee
|
Fix wrong tensor size
|
2024-07-14 15:40:56 +00:00 |
|
PoYen, Chen
|
93e5125d7a
|
Rename RoPE utility function
|
2024-07-14 14:48:06 +00:00 |
|
PoYen, Chen
|
83d6acc111
|
Apply RoPE on host side
|
2024-07-14 14:45:17 +00:00 |
|
Bartłomiej Kocot
|
82e8a78a3f
|
Support access per groups and filter3x3 in grouped conv fwd (#1382)
* Support access per groups and filter3x3 in grouped conv fwd
* Fixes for large cases
* Fixes for large tensors
|
2024-07-12 11:08:42 -07:00 |
|
PoYen, Chen
|
44c9bacff7
|
Rename function: add "batched" prefix
|
2024-07-12 06:51:31 +00:00 |
|
PoYen, Chen
|
ff75eff3bf
|
Reduce input/output dimensions
|
2024-07-12 06:49:43 +00:00 |
|
PoYen, Chen
|
3183b68921
|
Simplify v_host_ref definition
|
2024-07-12 06:42:41 +00:00 |
|
PoYen, Chen
|
e5885cab83
|
Simplify K appending logics
|
2024-07-12 06:37:23 +00:00 |
|
PoYen, Chen
|
3578c6f836
|
Append K/V in the host verification code
|
2024-07-12 06:32:35 +00:00 |
|
PoYen, Chen
|
4107bf03a6
|
Merge remote-tracking branch 'origin/feature/cond-add-splitkv' into feature/fmha-fwd-appendkv
|
2024-07-12 04:43:04 +00:00 |
|