PoYen, Chen
|
7789b53e15
|
Add tile navigators to the appendkv kernel
|
2024-08-07 04:51:21 +00:00 |
|
PoYen, Chen
|
443a528adc
|
Add block_table kernel args for appendkv kernel
|
2024-08-07 04:27:15 +00:00 |
|
PoYen, Chen
|
15d0034a64
|
Add paged-kv codegen logic for appendkv kernels
|
2024-08-07 04:19:45 +00:00 |
|
PoYen, Chen
|
db31475e07
|
Unify origin
|
2024-08-06 08:37:29 +00:00 |
|
PoYen, Chen
|
bd0d2f3975
|
Add batch_stride_k/batch_stride_v in group mode
|
2024-08-06 08:02:43 +00:00 |
|
PoYen, Chen
|
faf6b0e8ab
|
Fix wrong origin for bias
|
2024-08-06 08:02:08 +00:00 |
|
PoYen, Chen
|
f9e2bafd10
|
Make sure we always start reading complete tile
|
2024-08-06 03:13:57 +00:00 |
|
PoYen, Chen
|
4fed268723
|
Move code after decide seqlen_q/seqlen_k
|
2024-08-06 01:39:49 +00:00 |
|
PoYen, Chen
|
77dac7775c
|
Move V tile through TileWindowNavigator
|
2024-08-05 22:36:52 +00:00 |
|
PoYen, Chen
|
ab086bdb76
|
Simplify more make_tile_window() overloads
|
2024-08-05 22:16:24 +00:00 |
|
PoYen, Chen
|
bb78353264
|
Remove ununnecessary data members
|
2024-08-05 21:52:59 +00:00 |
|
PoYen, Chen
|
8fea4139df
|
Fix tile window navigation bugs
|
2024-08-05 21:34:15 +00:00 |
|
PoYen, Chen
|
ecaaa6f136
|
Simplify TileWindowNavigator interfaces
|
2024-08-05 16:31:31 +00:00 |
|
PoYen, Chen
|
1c9d77b606
|
Introduce 'TileWindowNavigator' types
|
2024-08-05 15:58:41 +00:00 |
|
PoYen, Chen
|
55b77cf962
|
Add another make_tile_window()
|
2024-08-05 15:57:03 +00:00 |
|
PoYen, Chen
|
24cb604373
|
Add copy_const<> type trait
|
2024-08-05 15:56:15 +00:00 |
|
PoYen, Chen
|
381f7e90e0
|
Merge branch 'develop' into feature/fmha-fwd-appendkv
|
2024-08-04 02:12:20 +00:00 |
|
PoYen, Chen
|
baf4a612f0
|
Fix wrong kernel name
|
2024-08-02 10:26:47 +00:00 |
|
PoYen, Chen
|
e7969b9fd2
|
Add template argument 'kIsPagedKV' for splitkv kernels
|
2024-08-02 10:14:51 +00:00 |
|
arai713
|
d32997a792
|
Codegen: isSupportedArgument check (#1417)
* added isSupportedArgument check into codegen device op
* adding function call
* remove commented code
|
2024-07-31 07:12:15 -07:00 |
|
carlushuang
|
b3f86e79dd
|
workaround rocm-6.2 compiler issue (#1421)
|
2024-07-31 16:03:59 +08:00 |
|
PoYen, Chen
|
3f7199873c
|
Merge branch 'develop' into feature/fmha-fwd-appendkv
|
2024-07-31 04:42:41 +00:00 |
|
Bartłomiej Kocot
|
33b399cc15
|
Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) (#1415)
|
2024-07-30 18:36:04 +02:00 |
|
PoYen, Chen
|
e688d99495
|
Merge remote-tracking branch 'origin/develop' into feature/fmha-fwd-appendkv
|
2024-07-26 07:14:59 +00:00 |
|
PoYen, Chen
|
c1c50ee498
|
Enlarge KPerThread for rotary_interleaved=false
|
2024-07-26 07:09:53 +00:00 |
|
zjing14
|
105bd708c7
|
Add rotating buff for gemm_multi_d (#1411)
* add rotating_buff for gemm_multi_d
* format
* Update flush_cache.hpp
* Update gtest.cmake
---------
Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: Haocong WANG <haocwang@amd.com>
|
2024-07-25 23:21:21 +08:00 |
|
Andriy Roshchenko
|
4a8a1befd5
|
Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. (#1412)
* Add CMakePresets configurations.
* Add binary elementwise ConvScaleAdd and an example.
* Numerical verification of results.
Observed significant irregularities in F8 to F32 type conversions:
```log
ConvScaleAdd: float=145.000000 f8_t=160.000000 e=144.000000
ConvScaleAdd: float=97.000000 f8_t=96.000000 e=104.000000
ConvScaleAdd: float=65.000000 f8_t=64.000000 e=72.000000
```
* Implemented ConvScaleAdd + Example.
* Add ConvScale+Bias Instances
* Add Client Example for ConvScale+Bias
* Fix number of bytes in an example..
* Cleanup.
|
2024-07-24 15:49:55 -05:00 |
|
Bartłomiej Kocot
|
ffabd70a15
|
Add support for half_t and bfloat to reduction operations (#1395)
* Add support for half_t and bfloat to reduction operations
* Fix bhalf convert
* Next fix bf16
|
2024-07-24 12:12:37 -05:00 |
|
PoYen, Chen
|
bd28e96425
|
Remove no-longer used method in pipeline
|
2024-07-24 06:59:45 +00:00 |
|
PoYen, Chen
|
5c733dc568
|
Remove debug statements
|
2024-07-24 06:47:52 +00:00 |
|
PoYen, Chen
|
d84c915549
|
Disable host verification if API not exist
|
2024-07-24 06:02:41 +00:00 |
|
PoYen, Chen
|
59e1d9b84f
|
Shift rotary_cos/rotary_sin by cache_seqlen_k
|
2024-07-24 05:06:47 +00:00 |
|
PoYen, Chen
|
a4da1e7f22
|
Remove RoPEComputeDataType type alias
|
2024-07-24 04:45:28 +00:00 |
|
PoYen, Chen
|
251f8cfea9
|
Merge branch 'develop' into feature/fmha-fwd-appendkv
|
2024-07-24 04:16:35 +00:00 |
|
PoYen, Chen
|
3348131699
|
Fix wrong data type for Q rotary_cos/rotary_sin
|
2024-07-24 04:10:43 +00:00 |
|
PoYen, Chen
|
5ea60715ea
|
Update host/device specifiers
|
2024-07-24 03:45:19 +00:00 |
|
PoYen, Chen
|
6f95239229
|
Use different rotary_cos/rotary_sin distr for Q/Knew
|
2024-07-24 03:40:29 +00:00 |
|
PoYen, Chen
|
47a74f282d
|
Extract Q/Knew vector size to helper methods
|
2024-07-24 03:23:18 +00:00 |
|
PoYen, Chen
|
eb4ea3ac2a
|
Fix wrong rotary_cos/rotary_sin memory size for Q
|
2024-07-23 16:22:25 +00:00 |
|
PoYen, Chen
|
b11f92dc4c
|
Fix wrong shape of knew_host/vnew_host
|
2024-07-23 14:52:42 +00:00 |
|
PoYen, Chen
|
ca4b208b60
|
Fix wrong grid size
|
2024-07-23 14:20:52 +00:00 |
|
PoYen, Chen
|
52b47810bb
|
Rename more tile size constants
|
2024-07-23 09:30:05 +00:00 |
|
PoYen, Chen
|
99c1d463de
|
Align naming of some tile size constants
|
2024-07-23 09:24:38 +00:00 |
|
PoYen, Chen
|
ce5e0f1d67
|
Re-order parameters
|
2024-07-23 09:02:41 +00:00 |
|
PoYen, Chen
|
fb80c7b2cb
|
Extract rotary embedding logic out
|
2024-07-23 08:51:59 +00:00 |
|
PoYen, Chen
|
2192bbc68a
|
Rename RotaryEmbeddingEnum
|
2024-07-23 07:50:50 +00:00 |
|
PoYen, Chen
|
d4606cf3c3
|
Rename header
|
2024-07-23 07:45:25 +00:00 |
|
PoYen, Chen
|
b275732128
|
Remove always true static_assert()
|
2024-07-23 07:25:50 +00:00 |
|
PoYen, Chen
|
eb649a2f25
|
Move thread locating logics into policy
|
2024-07-23 07:21:20 +00:00 |
|
PoYen, Chen
|
0e5cb6f913
|
Skip code if # of block is more than needed
|
2024-07-23 06:53:24 +00:00 |
|