Commit Graph

595 Commits

Author SHA1 Message Date
PoYen, Chen
3d3d73bee2 Fix wrong parameter name 2024-08-18 17:25:39 +00:00
PoYen, Chen
51062cae0b Merge remote-tracking branch 'origin/develop' into feature/fmha-fwd-appendkv 2024-08-16 16:47:06 +00:00
PoYen, Chen
43b8100b7f Support cache_batch_idx in example 2024-08-16 16:27:56 +00:00
PoYen, Chen
9c904b0e4c Pass cache_batch_idx to kernels 2024-08-16 15:32:24 +00:00
PoYen, Chen
2523c8e36c Fix more format 2024-08-16 10:32:17 +00:00
PoYen, Chen
5805f5aa73 Remove group mode from appendkv kernel 2024-08-16 10:04:48 +00:00
Haocong WANG
3049b5467c [GEMM] gemm_universal related optimization (#1453)
* replace buffer_atomic with global_atomic

* fixed global_atomic_add

* added bf16 atomic_add

* format

* clang-format-12

* clean

* clean

* add guards

* Update gtest.cmake

* enabled splitk_gemm_multi_d

* format

* add ckProfiler

* format

* fixed naming

* format

* clean

* clean

* add guards

* fix clang format

* format

* add kbatch printout

* clean

* Add rocm6.2 related gemm optimization

* Limit bf16 atomic usage

* remove redundant RCR gemm_universal instance

* Add RRR fp8 gemm universal instance

* Bug fix

* Add GPU_TARGET guard to FP8/BF8 target

* bug fix

* update cmake

* remove all fp8/bf8 example if arch not support

* Enable fp8 RRR support in ckProfiler

* limit greedy-reverse flag to gemm_universal in ckProfiler

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: Jing Zhang <jizhan@meta.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
2024-08-14 10:42:30 +08:00
Mateusz Ozga
0606e5498e Support large: 12d tensor size for reduction kenrel (#1465) 2024-08-13 16:15:47 +02:00
PoYen, Chen
9de0f35ebc Remove unused template paremeter 2024-08-13 09:29:20 +00:00
PoYen, Chen
370babc996 Make tile window directly via PageBlockNavigator 2024-08-13 09:18:24 +00:00
PoYen, Chen
3dd6ef61ef Re-order pipeline paremeters 2024-08-13 07:38:41 +00:00
PoYen, Chen
19c19d8bd3 Only expose necessary methods (not attributes) 2024-08-13 07:26:26 +00:00
PoYen, Chen
c54de6416a Rename TileWindowNavigator to PageBlockNavigator 2024-08-13 07:23:40 +00:00
Bartłomiej Kocot
4a870942e6 Fix bug with n block id calculation in DeviceGroupedConvXdlCShuffle (#1457)
* Fix typo in TransformConvFwdToGemm

* Fix bug in n offset calculation
2024-08-10 13:12:05 +02:00
Jun Liu
5ff8eeebf9 Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) (#1415)" (#1455)
This reverts commit 33b399cc15.
2024-08-08 19:09:33 -07:00
PoYen, Chen
d2f5d0910a Remove no-longer used pipeline files 2024-08-08 17:40:05 +00:00
PoYen, Chen
d3624a03de Merge branch 'develop' into feature/fmha-fwd-appendkv 2024-08-08 17:26:53 +00:00
PoYen, Chen
2f42e4460f Allow problem types without define kHasDropout attr 2024-08-08 10:53:42 +00:00
PoYen, Chen
a0d2163045 Remove dropout code in splitkv kernel 2024-08-08 10:21:34 +00:00
Juan Manuel Martinez Caamaño
901e5f1540 Remove reinterpret_cast uses that result in undefined behaviour. (#1445)
* Remove reinterpret_cast uses that result in undefined behaviour. Use a bitcast instead.

See https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_accessibility

Closes #1439

* fix clang format

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
2024-08-07 11:49:02 -07:00
PoYen, Chen
cef9da0a76 Remove debug macro usages 2024-08-07 15:26:43 +00:00
PoYen, Chen
eda78d1a10 Merge branch 'develop' into feature/fmha-fwd-appendkv 2024-08-07 12:17:45 +00:00
PoYen, Chen
1b96dc2592 Donot perform write again if already in last page-block 2024-08-07 12:00:15 +00:00
PoYen, Chen
f265742b63 Handle cross-page-block write 2024-08-07 09:33:41 +00:00
PoYen, Chen
40f0d01e29 Allow transit tile_window to another page-block 2024-08-07 09:29:55 +00:00
PoYen, Chen
26ed468ac6 Pass re-created tile window to pipeline 2024-08-07 06:00:17 +00:00
PoYen, Chen
78209c7326 Fix wrong tensor descriptor lengths 2024-08-07 05:59:26 +00:00
PoYen, Chen
7789b53e15 Add tile navigators to the appendkv kernel 2024-08-07 04:51:21 +00:00
PoYen, Chen
443a528adc Add block_table kernel args for appendkv kernel 2024-08-07 04:27:15 +00:00
PoYen, Chen
15d0034a64 Add paged-kv codegen logic for appendkv kernels 2024-08-07 04:19:45 +00:00
Juan Manuel Martinez Caamaño
fd9ef4e678 Add missing constexpr to if conditions (#1444) 2024-08-06 11:40:34 -07:00
jakpiase
b74d4d4d54 Fix for beta!=0 in reduce (#1440)
* fix for beta!=0 in reduce

* add reviewers suggestions
2024-08-06 09:10:39 -07:00
PoYen, Chen
db31475e07 Unify origin 2024-08-06 08:37:29 +00:00
Bartłomiej Kocot
4ec5c52a0c Add Grouped Conv Fwd Large Tensor kernel (#1432)
* Support 64 bit indexing

* Add new grouped conv fwd kernel for large tensors

* Add instances large tensor

* Fixes for transform conv to gemm

* Fixes

* fixes

* Remove not needed instances

* examples fixes

* Remove not need ds arrays

* Fix tests

* Add 2GB check in gridwise dl

* Fixes
2024-08-06 10:06:10 +02:00
PoYen, Chen
bd0d2f3975 Add batch_stride_k/batch_stride_v in group mode 2024-08-06 08:02:43 +00:00
PoYen, Chen
faf6b0e8ab Fix wrong origin for bias 2024-08-06 08:02:08 +00:00
PoYen, Chen
f9e2bafd10 Make sure we always start reading complete tile 2024-08-06 03:13:57 +00:00
PoYen, Chen
4fed268723 Move code after decide seqlen_q/seqlen_k 2024-08-06 01:39:49 +00:00
PoYen, Chen
77dac7775c Move V tile through TileWindowNavigator 2024-08-05 22:36:52 +00:00
PoYen, Chen
ab086bdb76 Simplify more make_tile_window() overloads 2024-08-05 22:16:24 +00:00
PoYen, Chen
bb78353264 Remove ununnecessary data members 2024-08-05 21:52:59 +00:00
PoYen, Chen
8fea4139df Fix tile window navigation bugs 2024-08-05 21:34:15 +00:00
PoYen, Chen
ecaaa6f136 Simplify TileWindowNavigator interfaces 2024-08-05 16:31:31 +00:00
PoYen, Chen
1c9d77b606 Introduce 'TileWindowNavigator' types 2024-08-05 15:58:41 +00:00
PoYen, Chen
55b77cf962 Add another make_tile_window() 2024-08-05 15:57:03 +00:00
PoYen, Chen
24cb604373 Add copy_const<> type trait 2024-08-05 15:56:15 +00:00
PoYen, Chen
381f7e90e0 Merge branch 'develop' into feature/fmha-fwd-appendkv 2024-08-04 02:12:20 +00:00
PoYen, Chen
baf4a612f0 Fix wrong kernel name 2024-08-02 10:26:47 +00:00
PoYen, Chen
e7969b9fd2 Add template argument 'kIsPagedKV' for splitkv kernels 2024-08-02 10:14:51 +00:00
arai713
d32997a792 Codegen: isSupportedArgument check (#1417)
* added isSupportedArgument check into codegen device op

* adding function call

* remove commented code
2024-07-31 07:12:15 -07:00