PoYen, Chen
a0d2163045
Remove dropout code in splitkv kernel
2024-08-08 10:21:34 +00:00
PoYen, Chen
9d9c5a6c24
Fix compilation errors
2024-08-08 08:26:55 +00:00
PoYen, Chen
247e135cfc
Remove fmha_fwd_dispatch()
2024-08-08 08:15:04 +00:00
PoYen, Chen
291e9b4bbb
Separate splitkv/non-splitkv args/traits
2024-08-08 08:07:03 +00:00
PoYen, Chen
655b13b059
Rename option s_k_new to s_knew
2024-08-07 15:31:54 +00:00
PoYen, Chen
b6c2f2f01d
Add missing group mode argument
2024-08-07 15:22:57 +00:00
PoYen, Chen
55ce2948a9
Always add fmha_fwd() api
2024-08-07 13:43:14 +00:00
PoYen, Chen
eda78d1a10
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-08-07 12:17:45 +00:00
PoYen, Chen
838f9955fd
Fix wrong strides for appendkv kernel
2024-08-07 08:06:47 +00:00
PoYen, Chen
443a528adc
Add block_table kernel args for appendkv kernel
2024-08-07 04:27:15 +00:00
PoYen, Chen
15d0034a64
Add paged-kv codegen logic for appendkv kernels
2024-08-07 04:19:45 +00:00
jakpiase
b74d4d4d54
Fix for beta!=0 in reduce ( #1440 )
...
* fix for beta!=0 in reduce
* add reviewers suggestions
2024-08-06 09:10:39 -07:00
PoYen, Chen
b98985262d
Add missing kernel arguments for group mode
2024-08-06 14:54:07 +00:00
Bartłomiej Kocot
4ec5c52a0c
Add Grouped Conv Fwd Large Tensor kernel ( #1432 )
...
* Support 64 bit indexing
* Add new grouped conv fwd kernel for large tensors
* Add instances large tensor
* Fixes for transform conv to gemm
* Fixes
* fixes
* Remove not needed instances
* examples fixes
* Remove not need ds arrays
* Fix tests
* Add 2GB check in gridwise dl
* Fixes
2024-08-06 10:06:10 +02:00
PoYen, Chen
12da00c3be
Use 128 as minimus page_block_size
2024-08-06 03:20:29 +00:00
PoYen, Chen
f9e2bafd10
Make sure we always start reading complete tile
2024-08-06 03:13:57 +00:00
PoYen, Chen
8779716403
Fix uneven split checking logic
2024-08-06 01:17:14 +00:00
PoYen, Chen
3fc7279519
Disable calling fmha_fwd()
2024-08-05 21:36:52 +00:00
PoYen, Chen
8fea4139df
Fix tile window navigation bugs
2024-08-05 21:34:15 +00:00
PoYen, Chen
90d84eaeae
Fix seqlen_k_min for pre-fill case (1 -> 0)
2024-08-04 02:53:40 +00:00
PoYen, Chen
381f7e90e0
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-08-04 02:12:20 +00:00
PoYen, Chen
db95d25d36
Launch splitkv kernel if given page_block_size
2024-08-02 10:26:09 +00:00
PoYen, Chen
e7969b9fd2
Add template argument 'kIsPagedKV' for splitkv kernels
2024-08-02 10:14:51 +00:00
carlushuang
b3f86e79dd
workaround rocm-6.2 compiler issue ( #1421 )
2024-07-31 16:03:59 +08:00
PoYen, Chen
e688d99495
Merge remote-tracking branch 'origin/develop' into feature/fmha-fwd-appendkv
2024-07-26 07:14:59 +00:00
PoYen, Chen
94f430de32
Update rotary_dim range in smoke_test_fwd.sh
2024-07-26 07:13:25 +00:00
PoYen, Chen
d41ff70db5
Enlarge rotary_dim limit (8 -> 16)
2024-07-26 06:51:24 +00:00
Andriy Roshchenko
4a8a1befd5
Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. ( #1412 )
...
* Add CMakePresets configurations.
* Add binary elementwise ConvScaleAdd and an example.
* Numerical verification of results.
Observed significant irregularities in F8 to F32 type conversions:
```log
ConvScaleAdd: float=145.000000 f8_t=160.000000 e=144.000000
ConvScaleAdd: float=97.000000 f8_t=96.000000 e=104.000000
ConvScaleAdd: float=65.000000 f8_t=64.000000 e=72.000000
```
* Implemented ConvScaleAdd + Example.
* Add ConvScale+Bias Instances
* Add Client Example for ConvScale+Bias
* Fix number of bytes in an example..
* Cleanup.
2024-07-24 15:49:55 -05:00
PoYen, Chen
4280a07d2a
Refine pipeline padding settings
2024-07-24 11:37:56 +00:00
PoYen, Chen
f053ae2b5b
Add missing init code
2024-07-24 07:12:06 +00:00
PoYen, Chen
c50c36a07f
Re-arrange the 'set +x' command
2024-07-24 06:56:53 +00:00
PoYen, Chen
8fb015b83f
Remove more debug statements
2024-07-24 06:48:40 +00:00
PoYen, Chen
2126d4d88d
Add append-kv smoke tests
2024-07-24 06:35:53 +00:00
PoYen, Chen
f7fb3fafaa
Allow only apply RoPE on Q (without append KV)
2024-07-24 06:26:00 +00:00
PoYen, Chen
08b4e8a125
Fix wrong rope key for fp8 pipeline
2024-07-24 06:06:07 +00:00
PoYen, Chen
d84c915549
Disable host verification if API not exist
2024-07-24 06:02:41 +00:00
PoYen, Chen
8a73d334b8
Rename utility function
2024-07-24 05:19:05 +00:00
PoYen, Chen
d59e098ec4
Fix wrong pipeline
2024-07-24 05:17:57 +00:00
PoYen, Chen
29c9b650b5
Align commit message to the real comment
2024-07-24 05:14:00 +00:00
PoYen, Chen
c7b7b44883
Add comment for why I just 't' for all padding flags
2024-07-24 05:13:16 +00:00
PoYen, Chen
59e1d9b84f
Shift rotary_cos/rotary_sin by cache_seqlen_k
2024-07-24 05:06:47 +00:00
PoYen, Chen
a4da1e7f22
Remove RoPEComputeDataType type alias
2024-07-24 04:45:28 +00:00
PoYen, Chen
251f8cfea9
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-07-24 04:16:35 +00:00
PoYen, Chen
eb4ea3ac2a
Fix wrong rotary_cos/rotary_sin memory size for Q
2024-07-23 16:22:25 +00:00
PoYen, Chen
85bac93951
Fix wrong index into knew_host/vnew_host
2024-07-23 15:31:15 +00:00
PoYen, Chen
b11f92dc4c
Fix wrong shape of knew_host/vnew_host
2024-07-23 14:52:42 +00:00
PoYen, Chen
ca4b208b60
Fix wrong grid size
2024-07-23 14:20:52 +00:00
PoYen, Chen
2192bbc68a
Rename RotaryEmbeddingEnum
2024-07-23 07:50:50 +00:00
PoYen, Chen
48c70720b5
Apply RoPE to q_tile
2024-07-23 03:54:11 +00:00
PoYen, Chen
1dbed18555
Remove constness from q_ptr
2024-07-23 03:11:31 +00:00