PoYen, Chen
d3f550f30c
Add s_barrier to sync threads
2024-08-22 09:03:01 +00:00
PoYen, Chen
e23e6b57ec
Fix compilation errors
2024-08-20 05:59:46 +00:00
PoYen, Chen
eb1c8a26fa
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-08-20 05:49:01 +00:00
PoYen, Chen
ee1445da23
Re-order seqlen_k_start adjustment logics
2024-08-19 20:14:45 +00:00
PoYen, Chen
40a4d96cf5
Return earlier if split is empty
2024-08-19 10:16:23 +00:00
PoYen, Chen
3d3d73bee2
Fix wrong parameter name
2024-08-18 17:25:39 +00:00
Dan Yao
79a5d9c10c
[CK_TILE] FA bwd kernels optimization ( #1397 )
...
* tmp save
* fix batch deterministic bugs
* fix group deterministic bugs
* codegen update
* reorder files
* bias support
* hd256 bias support
* bwd smoke test update
* simplify convert dq
* fix hd256 dropout scratch
* do{}while() -> while(){}
* comments
* remove FmhaBwdTilePartitioner
* save clear_tile
* refactor dropout
* code cleanup
* code cleanup
* comments
* fix epilogue problem
* fix fwd dropout
* group convert_dq opt
* fix dq alignment
* Do not store storerandval in bwd for flash attention integration
* fix hd32 error and boost performance
* revert
* Remove duplicated WarpGemm definitions in the policy file
* dropout patch for mrepeat 16*16
* code sync up
* dq_acc stride
* dq_acc stride stuff
* codegen update
* fwd dropout revert
* fix hd128 scratches and boost performance
* receipt 3 for simplified smoke test
* more strides for fa integration
* fix hd64 scratches and boost performance
* non-iglp pipeline for headdim padding cases
* dpad same as dvpad for flash attention integration
* unpadded lse&d for group mode
* Support unpad layout for group lse
* Support unpad lse layout for splitkv
* Fix stride for splitkv kernel
* fix unpadded lse issue in fwd splitkv
* comment
* solve lds read&write conflicts
* rename
* bias rename
* tile index revert
---------
Co-authored-by: danyao12 <danyao12>
Co-authored-by: rocking <ChunYu.Lai@amd.com >
Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com >
2024-08-16 13:40:10 -07:00
PoYen, Chen
51062cae0b
Merge remote-tracking branch 'origin/develop' into feature/fmha-fwd-appendkv
2024-08-16 16:47:06 +00:00
PoYen, Chen
43b8100b7f
Support cache_batch_idx in example
2024-08-16 16:27:56 +00:00
PoYen, Chen
9c904b0e4c
Pass cache_batch_idx to kernels
2024-08-16 15:32:24 +00:00
PoYen, Chen
2523c8e36c
Fix more format
2024-08-16 10:32:17 +00:00
PoYen, Chen
5805f5aa73
Remove group mode from appendkv kernel
2024-08-16 10:04:48 +00:00
Haocong WANG
3049b5467c
[GEMM] gemm_universal related optimization ( #1453 )
...
* replace buffer_atomic with global_atomic
* fixed global_atomic_add
* added bf16 atomic_add
* format
* clang-format-12
* clean
* clean
* add guards
* Update gtest.cmake
* enabled splitk_gemm_multi_d
* format
* add ckProfiler
* format
* fixed naming
* format
* clean
* clean
* add guards
* fix clang format
* format
* add kbatch printout
* clean
* Add rocm6.2 related gemm optimization
* Limit bf16 atomic usage
* remove redundant RCR gemm_universal instance
* Add RRR fp8 gemm universal instance
* Bug fix
* Add GPU_TARGET guard to FP8/BF8 target
* bug fix
* update cmake
* remove all fp8/bf8 example if arch not support
* Enable fp8 RRR support in ckProfiler
* limit greedy-reverse flag to gemm_universal in ckProfiler
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
Co-authored-by: Jing Zhang <jizhan@meta.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-08-14 10:42:30 +08:00
Mateusz Ozga
0606e5498e
Support large: 12d tensor size for reduction kenrel ( #1465 )
2024-08-13 16:15:47 +02:00
PoYen, Chen
9de0f35ebc
Remove unused template paremeter
2024-08-13 09:29:20 +00:00
PoYen, Chen
370babc996
Make tile window directly via PageBlockNavigator
2024-08-13 09:18:24 +00:00
PoYen, Chen
3dd6ef61ef
Re-order pipeline paremeters
2024-08-13 07:38:41 +00:00
PoYen, Chen
19c19d8bd3
Only expose necessary methods (not attributes)
2024-08-13 07:26:26 +00:00
PoYen, Chen
c54de6416a
Rename TileWindowNavigator to PageBlockNavigator
2024-08-13 07:23:40 +00:00
Bartłomiej Kocot
4a870942e6
Fix bug with n block id calculation in DeviceGroupedConvXdlCShuffle ( #1457 )
...
* Fix typo in TransformConvFwdToGemm
* Fix bug in n offset calculation
2024-08-10 13:12:05 +02:00
Jun Liu
5ff8eeebf9
Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd ( #1382 ) ( #1406 ) ( #1415 )" ( #1455 )
...
This reverts commit 33b399cc15 .
2024-08-08 19:09:33 -07:00
PoYen, Chen
d2f5d0910a
Remove no-longer used pipeline files
2024-08-08 17:40:05 +00:00
PoYen, Chen
d3624a03de
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-08-08 17:26:53 +00:00
PoYen, Chen
2f42e4460f
Allow problem types without define kHasDropout attr
2024-08-08 10:53:42 +00:00
PoYen, Chen
a0d2163045
Remove dropout code in splitkv kernel
2024-08-08 10:21:34 +00:00
Juan Manuel Martinez Caamaño
901e5f1540
Remove reinterpret_cast uses that result in undefined behaviour. ( #1445 )
...
* Remove reinterpret_cast uses that result in undefined behaviour. Use a bitcast instead.
See https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_accessibility
Closes #1439
* fix clang format
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-08-07 11:49:02 -07:00
PoYen, Chen
cef9da0a76
Remove debug macro usages
2024-08-07 15:26:43 +00:00
PoYen, Chen
eda78d1a10
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-08-07 12:17:45 +00:00
PoYen, Chen
1b96dc2592
Donot perform write again if already in last page-block
2024-08-07 12:00:15 +00:00
PoYen, Chen
f265742b63
Handle cross-page-block write
2024-08-07 09:33:41 +00:00
PoYen, Chen
40f0d01e29
Allow transit tile_window to another page-block
2024-08-07 09:29:55 +00:00
PoYen, Chen
26ed468ac6
Pass re-created tile window to pipeline
2024-08-07 06:00:17 +00:00
PoYen, Chen
78209c7326
Fix wrong tensor descriptor lengths
2024-08-07 05:59:26 +00:00
PoYen, Chen
7789b53e15
Add tile navigators to the appendkv kernel
2024-08-07 04:51:21 +00:00
PoYen, Chen
443a528adc
Add block_table kernel args for appendkv kernel
2024-08-07 04:27:15 +00:00
PoYen, Chen
15d0034a64
Add paged-kv codegen logic for appendkv kernels
2024-08-07 04:19:45 +00:00
Juan Manuel Martinez Caamaño
fd9ef4e678
Add missing constexpr to if conditions ( #1444 )
2024-08-06 11:40:34 -07:00
jakpiase
b74d4d4d54
Fix for beta!=0 in reduce ( #1440 )
...
* fix for beta!=0 in reduce
* add reviewers suggestions
2024-08-06 09:10:39 -07:00
PoYen, Chen
db31475e07
Unify origin
2024-08-06 08:37:29 +00:00
Bartłomiej Kocot
4ec5c52a0c
Add Grouped Conv Fwd Large Tensor kernel ( #1432 )
...
* Support 64 bit indexing
* Add new grouped conv fwd kernel for large tensors
* Add instances large tensor
* Fixes for transform conv to gemm
* Fixes
* fixes
* Remove not needed instances
* examples fixes
* Remove not need ds arrays
* Fix tests
* Add 2GB check in gridwise dl
* Fixes
2024-08-06 10:06:10 +02:00
PoYen, Chen
bd0d2f3975
Add batch_stride_k/batch_stride_v in group mode
2024-08-06 08:02:43 +00:00
PoYen, Chen
faf6b0e8ab
Fix wrong origin for bias
2024-08-06 08:02:08 +00:00
PoYen, Chen
f9e2bafd10
Make sure we always start reading complete tile
2024-08-06 03:13:57 +00:00
PoYen, Chen
4fed268723
Move code after decide seqlen_q/seqlen_k
2024-08-06 01:39:49 +00:00
PoYen, Chen
77dac7775c
Move V tile through TileWindowNavigator
2024-08-05 22:36:52 +00:00
PoYen, Chen
ab086bdb76
Simplify more make_tile_window() overloads
2024-08-05 22:16:24 +00:00
PoYen, Chen
bb78353264
Remove ununnecessary data members
2024-08-05 21:52:59 +00:00
PoYen, Chen
8fea4139df
Fix tile window navigation bugs
2024-08-05 21:34:15 +00:00
PoYen, Chen
ecaaa6f136
Simplify TileWindowNavigator interfaces
2024-08-05 16:31:31 +00:00
PoYen, Chen
1c9d77b606
Introduce 'TileWindowNavigator' types
2024-08-05 15:58:41 +00:00