PoYen, Chen
b6c2f2f01d
Add missing group mode argument
2024-08-07 15:22:57 +00:00
PoYen, Chen
55ce2948a9
Always add fmha_fwd() api
2024-08-07 13:43:14 +00:00
PoYen, Chen
eda78d1a10
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-08-07 12:17:45 +00:00
PoYen, Chen
1b96dc2592
Donot perform write again if already in last page-block
2024-08-07 12:00:15 +00:00
PoYen, Chen
f265742b63
Handle cross-page-block write
2024-08-07 09:33:41 +00:00
PoYen, Chen
40f0d01e29
Allow transit tile_window to another page-block
2024-08-07 09:29:55 +00:00
PoYen, Chen
838f9955fd
Fix wrong strides for appendkv kernel
2024-08-07 08:06:47 +00:00
PoYen, Chen
26ed468ac6
Pass re-created tile window to pipeline
2024-08-07 06:00:17 +00:00
PoYen, Chen
78209c7326
Fix wrong tensor descriptor lengths
2024-08-07 05:59:26 +00:00
PoYen, Chen
7789b53e15
Add tile navigators to the appendkv kernel
2024-08-07 04:51:21 +00:00
PoYen, Chen
443a528adc
Add block_table kernel args for appendkv kernel
2024-08-07 04:27:15 +00:00
PoYen, Chen
15d0034a64
Add paged-kv codegen logic for appendkv kernels
2024-08-07 04:19:45 +00:00
Max Podkorytov
886d14ccb2
modify python wrapper for addmm ( #1441 )
2024-08-06 15:09:27 -07:00
Haocong WANG
6fc7bff58f
Limit fp8only operator build arch in ckProfiler ( #1443 )
2024-08-06 14:29:14 -07:00
Jun Liu
afbf6350f3
Fix ROCm 6.2 compiler not fully supporting gfx12 when building CK with INSTANCES_ONLY ( #1446 )
2024-08-06 13:06:53 -07:00
Juan Manuel Martinez Caamaño
fd9ef4e678
Add missing constexpr to if conditions ( #1444 )
2024-08-06 11:40:34 -07:00
bibek
840c5397bb
adding mha as static lib ( #1366 )
...
* adding mha as static lib
* add fmha fwd compile options
* typo
* fix python version
* python version to 3
* increase path length
* add max path flag in mha cmake
* fix long path issue
* mha currently only runs in gfx94x
* only buld mha in mi300
* populate gpu_list
* add mha compile flags
* avoid building mha in gpu other then gfx94x
* some comments and include ck_tile in rocm
* use rocm_install
* place ck_tile in include
* correct ck_tile path
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2024-08-06 11:17:10 -05:00
jakpiase
b74d4d4d54
Fix for beta!=0 in reduce ( #1440 )
...
* fix for beta!=0 in reduce
* add reviewers suggestions
2024-08-06 09:10:39 -07:00
PoYen, Chen
b98985262d
Add missing kernel arguments for group mode
2024-08-06 14:54:07 +00:00
PoYen, Chen
db31475e07
Unify origin
2024-08-06 08:37:29 +00:00
Bartłomiej Kocot
4ec5c52a0c
Add Grouped Conv Fwd Large Tensor kernel ( #1432 )
...
* Support 64 bit indexing
* Add new grouped conv fwd kernel for large tensors
* Add instances large tensor
* Fixes for transform conv to gemm
* Fixes
* fixes
* Remove not needed instances
* examples fixes
* Remove not need ds arrays
* Fix tests
* Add 2GB check in gridwise dl
* Fixes
2024-08-06 10:06:10 +02:00
PoYen, Chen
bd0d2f3975
Add batch_stride_k/batch_stride_v in group mode
2024-08-06 08:02:43 +00:00
PoYen, Chen
faf6b0e8ab
Fix wrong origin for bias
2024-08-06 08:02:08 +00:00
PoYen, Chen
12da00c3be
Use 128 as minimus page_block_size
2024-08-06 03:20:29 +00:00
PoYen, Chen
f9e2bafd10
Make sure we always start reading complete tile
2024-08-06 03:13:57 +00:00
PoYen, Chen
4fed268723
Move code after decide seqlen_q/seqlen_k
2024-08-06 01:39:49 +00:00
PoYen, Chen
8779716403
Fix uneven split checking logic
2024-08-06 01:17:14 +00:00
PoYen, Chen
77dac7775c
Move V tile through TileWindowNavigator
2024-08-05 22:36:52 +00:00
PoYen, Chen
ab086bdb76
Simplify more make_tile_window() overloads
2024-08-05 22:16:24 +00:00
PoYen, Chen
bb78353264
Remove ununnecessary data members
2024-08-05 21:52:59 +00:00
PoYen, Chen
3fc7279519
Disable calling fmha_fwd()
2024-08-05 21:36:52 +00:00
PoYen, Chen
8fea4139df
Fix tile window navigation bugs
2024-08-05 21:34:15 +00:00
PoYen, Chen
ecaaa6f136
Simplify TileWindowNavigator interfaces
2024-08-05 16:31:31 +00:00
PoYen, Chen
1c9d77b606
Introduce 'TileWindowNavigator' types
2024-08-05 15:58:41 +00:00
PoYen, Chen
55b77cf962
Add another make_tile_window()
2024-08-05 15:57:03 +00:00
PoYen, Chen
24cb604373
Add copy_const<> type trait
2024-08-05 15:56:15 +00:00
Illia Silin
7f57b2e02c
add --offload-compress compiler flag ( #1433 )
...
* add --offload-compress compiler flag
* only apply the --offload-compress flag to the ckProfiler
* move the --offload-compress flag back to main cmake file
* add offload-compress to target compile option of ckProfiler
---------
Co-authored-by: carlushuang <carlus.huang@amd.com >
2024-08-05 23:26:01 +08:00
Illia Silin
f31ba04afc
[CI][Jenkins] delete CI docker container upon exit ( #1437 )
2024-08-05 08:13:56 -07:00
PoYen, Chen
90d84eaeae
Fix seqlen_k_min for pre-fill case (1 -> 0)
2024-08-04 02:53:40 +00:00
PoYen, Chen
381f7e90e0
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-08-04 02:12:20 +00:00
PoYen, Chen
baf4a612f0
Fix wrong kernel name
2024-08-02 10:26:47 +00:00
PoYen, Chen
db95d25d36
Launch splitkv kernel if given page_block_size
2024-08-02 10:26:09 +00:00
PoYen, Chen
e7969b9fd2
Add template argument 'kIsPagedKV' for splitkv kernels
2024-08-02 10:14:51 +00:00
Illia Silin
d311c95396
Add compiler flags for ROCm versions 6.2+ ( #1429 )
...
* add compiler flags to fix compiler issues
* fix typo.
* disable test_smfmac_op on all devices except gfx942
* specify full path to compiler in CI
2024-08-01 08:27:52 -07:00
Sam Wu
6648fd3b04
Update doc requirements ( #1423 )
2024-07-31 07:42:42 -07:00
zjing14
f31e8dfa80
[HotFix] Fixed a typo in profile_gemm_multiply_multiply ( #1425 )
...
* fixed a typo
* clean
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
2024-07-31 07:19:17 -07:00
arai713
d32997a792
Codegen: isSupportedArgument check ( #1417 )
...
* added isSupportedArgument check into codegen device op
* adding function call
* remove commented code
2024-07-31 07:12:15 -07:00
carlushuang
b3f86e79dd
workaround rocm-6.2 compiler issue ( #1421 )
2024-07-31 16:03:59 +08:00
PoYen, Chen
3f7199873c
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-07-31 04:42:41 +00:00
Illia Silin
b527cad4a5
add docker for rocm6.2_rc4 compiler ( #1424 )
2024-07-30 11:55:33 -07:00