PoYen, Chen
d3fd64cd26
Add more appendkv test
2024-08-16 18:03:28 +00:00
PoYen, Chen
51062cae0b
Merge remote-tracking branch 'origin/develop' into feature/fmha-fwd-appendkv
2024-08-16 16:47:06 +00:00
PoYen, Chen
41fdf9b2bc
Fix compilation error
2024-08-16 16:39:11 +00:00
PoYen, Chen
43b8100b7f
Support cache_batch_idx in example
2024-08-16 16:27:56 +00:00
PoYen, Chen
9c904b0e4c
Pass cache_batch_idx to kernels
2024-08-16 15:32:24 +00:00
PoYen, Chen
e6239e14f7
Re-organize bash functions
2024-08-16 12:46:16 +00:00
PoYen, Chen
2523c8e36c
Fix more format
2024-08-16 10:32:17 +00:00
PoYen, Chen
5728c0be65
Fix formatting
2024-08-16 10:25:46 +00:00
PoYen, Chen
095819a387
Remove options
2024-08-16 10:22:44 +00:00
PoYen, Chen
f2b3620511
Use meaningful options in smoke test
2024-08-16 10:18:14 +00:00
PoYen, Chen
aadd3ec63e
Fix wrong syntax in skcheck expr
2024-08-16 10:09:46 +00:00
PoYen, Chen
a4c6029a3d
Fix skcheck logic
2024-08-16 10:08:01 +00:00
PoYen, Chen
5805f5aa73
Remove group mode from appendkv kernel
2024-08-16 10:04:48 +00:00
Haocong WANG
3049b5467c
[GEMM] gemm_universal related optimization ( #1453 )
...
* replace buffer_atomic with global_atomic
* fixed global_atomic_add
* added bf16 atomic_add
* format
* clang-format-12
* clean
* clean
* add guards
* Update gtest.cmake
* enabled splitk_gemm_multi_d
* format
* add ckProfiler
* format
* fixed naming
* format
* clean
* clean
* add guards
* fix clang format
* format
* add kbatch printout
* clean
* Add rocm6.2 related gemm optimization
* Limit bf16 atomic usage
* remove redundant RCR gemm_universal instance
* Add RRR fp8 gemm universal instance
* Bug fix
* Add GPU_TARGET guard to FP8/BF8 target
* bug fix
* update cmake
* remove all fp8/bf8 example if arch not support
* Enable fp8 RRR support in ckProfiler
* limit greedy-reverse flag to gemm_universal in ckProfiler
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
Co-authored-by: Jing Zhang <jizhan@meta.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-08-14 10:42:30 +08:00
Mateusz Ozga
0606e5498e
Support large: 12d tensor size for reduction kenrel ( #1465 )
2024-08-13 16:15:47 +02:00
PoYen, Chen
a8a2275aca
Fix wrong arugment count
2024-08-13 08:42:23 +00:00
PoYen, Chen
d96752d0f5
Refine smoke_test_fwd.sh
2024-08-13 08:36:04 +00:00
Illia Silin
cbb6f2ab8c
Disable inapplicable xdl and mha instances for gfx12 ( #1464 )
2024-08-12 15:11:58 -07:00
PoYen, Chen
e8603dc21a
Add missing comment
2024-08-08 20:40:50 +00:00
PoYen, Chen
822d5dcd8e
Fix wrong seqlen for kvcache
2024-08-08 20:39:36 +00:00
PoYen, Chen
6a399ea47e
Use generic lambda to init all the api traits/args
2024-08-08 19:22:53 +00:00
PoYen, Chen
9206808835
Move functors to the begining of validation code
2024-08-08 18:01:10 +00:00
PoYen, Chen
028d89862a
Wrap code by #if directives
2024-08-08 17:58:49 +00:00
PoYen, Chen
9dddf6e437
Rename 'max_num_blocks' to 'max_num_page_blocks'
2024-08-08 17:38:08 +00:00
PoYen, Chen
e3a4bfba88
Show more detailed warning message
2024-08-08 17:35:36 +00:00
PoYen, Chen
d3624a03de
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-08-08 17:26:53 +00:00
PoYen, Chen
3e2b69e163
Display more info for specific kernels
2024-08-08 17:26:09 +00:00
PoYen, Chen
c8f63d4848
Separate more non-splitkv & splitkv traits/args
2024-08-08 16:54:00 +00:00
PoYen, Chen
677d9b28dd
Use generic lambda to init traits objects
2024-08-08 16:38:17 +00:00
PoYen, Chen
a0d2163045
Remove dropout code in splitkv kernel
2024-08-08 10:21:34 +00:00
PoYen, Chen
9d9c5a6c24
Fix compilation errors
2024-08-08 08:26:55 +00:00
PoYen, Chen
247e135cfc
Remove fmha_fwd_dispatch()
2024-08-08 08:15:04 +00:00
PoYen, Chen
291e9b4bbb
Separate splitkv/non-splitkv args/traits
2024-08-08 08:07:03 +00:00
PoYen, Chen
655b13b059
Rename option s_k_new to s_knew
2024-08-07 15:31:54 +00:00
PoYen, Chen
b6c2f2f01d
Add missing group mode argument
2024-08-07 15:22:57 +00:00
Illia Silin
12c1f68dd9
Run CK_TILE FMHA benchmarks and collect the performance data. ( #1447 )
...
* run ck_tile benchmarks after the smoke tests and store logs
* change the path of fmha benchmark logs
* change the way of stashig ck_tile fmha logs
* prevent the errors in stages where no logs are generated
* fix the ck_tile fmha log names and headers
* generate the fmha performance logs in the root folder
* change jenkins scrip arguments format
* use exact file names for stashing
* modify scripts to process FMHA performance results
* unstash FMHA logs before parsing them
2024-08-07 08:18:26 -07:00
PoYen, Chen
55ce2948a9
Always add fmha_fwd() api
2024-08-07 13:43:14 +00:00
PoYen, Chen
eda78d1a10
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-08-07 12:17:45 +00:00
PoYen, Chen
838f9955fd
Fix wrong strides for appendkv kernel
2024-08-07 08:06:47 +00:00
PoYen, Chen
443a528adc
Add block_table kernel args for appendkv kernel
2024-08-07 04:27:15 +00:00
PoYen, Chen
15d0034a64
Add paged-kv codegen logic for appendkv kernels
2024-08-07 04:19:45 +00:00
jakpiase
b74d4d4d54
Fix for beta!=0 in reduce ( #1440 )
...
* fix for beta!=0 in reduce
* add reviewers suggestions
2024-08-06 09:10:39 -07:00
PoYen, Chen
b98985262d
Add missing kernel arguments for group mode
2024-08-06 14:54:07 +00:00
Bartłomiej Kocot
4ec5c52a0c
Add Grouped Conv Fwd Large Tensor kernel ( #1432 )
...
* Support 64 bit indexing
* Add new grouped conv fwd kernel for large tensors
* Add instances large tensor
* Fixes for transform conv to gemm
* Fixes
* fixes
* Remove not needed instances
* examples fixes
* Remove not need ds arrays
* Fix tests
* Add 2GB check in gridwise dl
* Fixes
2024-08-06 10:06:10 +02:00
PoYen, Chen
12da00c3be
Use 128 as minimus page_block_size
2024-08-06 03:20:29 +00:00
PoYen, Chen
f9e2bafd10
Make sure we always start reading complete tile
2024-08-06 03:13:57 +00:00
PoYen, Chen
8779716403
Fix uneven split checking logic
2024-08-06 01:17:14 +00:00
PoYen, Chen
3fc7279519
Disable calling fmha_fwd()
2024-08-05 21:36:52 +00:00
PoYen, Chen
8fea4139df
Fix tile window navigation bugs
2024-08-05 21:34:15 +00:00
PoYen, Chen
90d84eaeae
Fix seqlen_k_min for pre-fill case (1 -> 0)
2024-08-04 02:53:40 +00:00