PoYen, Chen
a0d2163045
Remove dropout code in splitkv kernel
2024-08-08 10:21:34 +00:00
PoYen, Chen
cef9da0a76
Remove debug macro usages
2024-08-07 15:26:43 +00:00
PoYen, Chen
eda78d1a10
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-08-07 12:17:45 +00:00
PoYen, Chen
1b96dc2592
Donot perform write again if already in last page-block
2024-08-07 12:00:15 +00:00
PoYen, Chen
f265742b63
Handle cross-page-block write
2024-08-07 09:33:41 +00:00
PoYen, Chen
40f0d01e29
Allow transit tile_window to another page-block
2024-08-07 09:29:55 +00:00
PoYen, Chen
26ed468ac6
Pass re-created tile window to pipeline
2024-08-07 06:00:17 +00:00
PoYen, Chen
78209c7326
Fix wrong tensor descriptor lengths
2024-08-07 05:59:26 +00:00
PoYen, Chen
7789b53e15
Add tile navigators to the appendkv kernel
2024-08-07 04:51:21 +00:00
PoYen, Chen
443a528adc
Add block_table kernel args for appendkv kernel
2024-08-07 04:27:15 +00:00
PoYen, Chen
15d0034a64
Add paged-kv codegen logic for appendkv kernels
2024-08-07 04:19:45 +00:00
Juan Manuel Martinez Caamaño
fd9ef4e678
Add missing constexpr to if conditions ( #1444 )
2024-08-06 11:40:34 -07:00
jakpiase
b74d4d4d54
Fix for beta!=0 in reduce ( #1440 )
...
* fix for beta!=0 in reduce
* add reviewers suggestions
2024-08-06 09:10:39 -07:00
PoYen, Chen
db31475e07
Unify origin
2024-08-06 08:37:29 +00:00
Bartłomiej Kocot
4ec5c52a0c
Add Grouped Conv Fwd Large Tensor kernel ( #1432 )
...
* Support 64 bit indexing
* Add new grouped conv fwd kernel for large tensors
* Add instances large tensor
* Fixes for transform conv to gemm
* Fixes
* fixes
* Remove not needed instances
* examples fixes
* Remove not need ds arrays
* Fix tests
* Add 2GB check in gridwise dl
* Fixes
2024-08-06 10:06:10 +02:00
PoYen, Chen
bd0d2f3975
Add batch_stride_k/batch_stride_v in group mode
2024-08-06 08:02:43 +00:00
PoYen, Chen
faf6b0e8ab
Fix wrong origin for bias
2024-08-06 08:02:08 +00:00
PoYen, Chen
f9e2bafd10
Make sure we always start reading complete tile
2024-08-06 03:13:57 +00:00
PoYen, Chen
4fed268723
Move code after decide seqlen_q/seqlen_k
2024-08-06 01:39:49 +00:00
PoYen, Chen
77dac7775c
Move V tile through TileWindowNavigator
2024-08-05 22:36:52 +00:00
PoYen, Chen
ab086bdb76
Simplify more make_tile_window() overloads
2024-08-05 22:16:24 +00:00
PoYen, Chen
bb78353264
Remove ununnecessary data members
2024-08-05 21:52:59 +00:00
PoYen, Chen
8fea4139df
Fix tile window navigation bugs
2024-08-05 21:34:15 +00:00
PoYen, Chen
ecaaa6f136
Simplify TileWindowNavigator interfaces
2024-08-05 16:31:31 +00:00
PoYen, Chen
1c9d77b606
Introduce 'TileWindowNavigator' types
2024-08-05 15:58:41 +00:00
PoYen, Chen
55b77cf962
Add another make_tile_window()
2024-08-05 15:57:03 +00:00
PoYen, Chen
24cb604373
Add copy_const<> type trait
2024-08-05 15:56:15 +00:00
PoYen, Chen
381f7e90e0
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-08-04 02:12:20 +00:00
PoYen, Chen
baf4a612f0
Fix wrong kernel name
2024-08-02 10:26:47 +00:00
PoYen, Chen
e7969b9fd2
Add template argument 'kIsPagedKV' for splitkv kernels
2024-08-02 10:14:51 +00:00
arai713
d32997a792
Codegen: isSupportedArgument check ( #1417 )
...
* added isSupportedArgument check into codegen device op
* adding function call
* remove commented code
2024-07-31 07:12:15 -07:00
carlushuang
b3f86e79dd
workaround rocm-6.2 compiler issue ( #1421 )
2024-07-31 16:03:59 +08:00
PoYen, Chen
3f7199873c
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-07-31 04:42:41 +00:00
Bartłomiej Kocot
33b399cc15
Revert Revert Support access per groups and filter2x3 in grouped conv fwd ( #1382 ) ( #1406 ) ( #1415 )
2024-07-30 18:36:04 +02:00
PoYen, Chen
e688d99495
Merge remote-tracking branch 'origin/develop' into feature/fmha-fwd-appendkv
2024-07-26 07:14:59 +00:00
PoYen, Chen
c1c50ee498
Enlarge KPerThread for rotary_interleaved=false
2024-07-26 07:09:53 +00:00
zjing14
105bd708c7
Add rotating buff for gemm_multi_d ( #1411 )
...
* add rotating_buff for gemm_multi_d
* format
* Update flush_cache.hpp
* Update gtest.cmake
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
Co-authored-by: Haocong WANG <haocwang@amd.com >
2024-07-25 23:21:21 +08:00
Andriy Roshchenko
4a8a1befd5
Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. ( #1412 )
...
* Add CMakePresets configurations.
* Add binary elementwise ConvScaleAdd and an example.
* Numerical verification of results.
Observed significant irregularities in F8 to F32 type conversions:
```log
ConvScaleAdd: float=145.000000 f8_t=160.000000 e=144.000000
ConvScaleAdd: float=97.000000 f8_t=96.000000 e=104.000000
ConvScaleAdd: float=65.000000 f8_t=64.000000 e=72.000000
```
* Implemented ConvScaleAdd + Example.
* Add ConvScale+Bias Instances
* Add Client Example for ConvScale+Bias
* Fix number of bytes in an example..
* Cleanup.
2024-07-24 15:49:55 -05:00
Bartłomiej Kocot
ffabd70a15
Add support for half_t and bfloat to reduction operations ( #1395 )
...
* Add support for half_t and bfloat to reduction operations
* Fix bhalf convert
* Next fix bf16
2024-07-24 12:12:37 -05:00
PoYen, Chen
bd28e96425
Remove no-longer used method in pipeline
2024-07-24 06:59:45 +00:00
PoYen, Chen
5c733dc568
Remove debug statements
2024-07-24 06:47:52 +00:00
PoYen, Chen
d84c915549
Disable host verification if API not exist
2024-07-24 06:02:41 +00:00
PoYen, Chen
59e1d9b84f
Shift rotary_cos/rotary_sin by cache_seqlen_k
2024-07-24 05:06:47 +00:00
PoYen, Chen
a4da1e7f22
Remove RoPEComputeDataType type alias
2024-07-24 04:45:28 +00:00
PoYen, Chen
251f8cfea9
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-07-24 04:16:35 +00:00
PoYen, Chen
3348131699
Fix wrong data type for Q rotary_cos/rotary_sin
2024-07-24 04:10:43 +00:00
PoYen, Chen
5ea60715ea
Update host/device specifiers
2024-07-24 03:45:19 +00:00
PoYen, Chen
6f95239229
Use different rotary_cos/rotary_sin distr for Q/Knew
2024-07-24 03:40:29 +00:00
PoYen, Chen
47a74f282d
Extract Q/Knew vector size to helper methods
2024-07-24 03:23:18 +00:00
PoYen, Chen
eb4ea3ac2a
Fix wrong rotary_cos/rotary_sin memory size for Q
2024-07-23 16:22:25 +00:00