PoYen, Chen
|
e00ff9d246
|
Simplify pipeline source code
|
2024-06-12 09:17:04 +00:00 |
|
PoYen, Chen
|
ff61463cab
|
Use read descriptor to locate lds elements
|
2024-06-12 04:31:33 +00:00 |
|
PoYen, Chen
|
fcf5cd5e57
|
Undo removing necessary value-overwrite logic
|
2024-06-12 04:21:31 +00:00 |
|
PoYen, Chen
|
e1b4ac293e
|
Support load_tile() for tile_window_with_static_lengths<>
|
2024-06-12 04:20:09 +00:00 |
|
PoYen, Chen
|
a3fad6aae5
|
Add transposed lds descriptor
|
2024-06-12 03:46:41 +00:00 |
|
PoYen, Chen
|
ba0bc1507c
|
Remove necessary value-overwrite logic
|
2024-06-12 03:07:32 +00:00 |
|
PoYen, Chen
|
318b2d5c12
|
Remove hand-written store_tile() code
|
2024-06-12 02:54:32 +00:00 |
|
PoYen, Chen
|
a939ec5da4
|
Set invalid element value for LSEacc tensor view
|
2024-06-12 02:53:55 +00:00 |
|
PoYen, Chen
|
ff866f6bb6
|
Support providing invalid element for tensor view
|
2024-06-12 02:52:07 +00:00 |
|
PoYen, Chen
|
b994668714
|
Use tensor_descriptor to locate LSEacc elements
|
2024-06-12 02:32:33 +00:00 |
|
PoYen, Chen
|
ec82f3bbd6
|
Re-order pipeline call operator arguments
|
2024-06-11 19:54:30 +00:00 |
|
PoYen, Chen
|
9d1243e7fa
|
Pass LSE/O strides in kernel argument
|
2024-06-11 19:45:21 +00:00 |
|
PoYen, Chen
|
df4fc8f26c
|
Re-order split-kv pipeline call operator arguments
|
2024-06-11 19:23:19 +00:00 |
|
PoYen, Chen
|
6ee71c2bf6
|
Add stride kernel arguments for LSE/O acc workspace
|
2024-06-11 19:18:22 +00:00 |
|
PoYen, Chen
|
f968a7e442
|
Remove more debug code in combine pipeline
|
2024-06-11 18:36:23 +00:00 |
|
PoYen, Chen
|
4f8cef36bc
|
Fix example output format
|
2024-06-11 18:21:31 +00:00 |
|
PoYen, Chen
|
5c752a02b7
|
Fix wrong pipeline args for fp8
|
2024-06-11 14:55:45 +00:00 |
|
PoYen, Chen
|
eaca81945e
|
Remove unnessary tile size for fp8
|
2024-06-11 14:42:32 +00:00 |
|
PoYen, Chen
|
8eb6e451f2
|
Undo disabling data types
|
2024-06-11 14:37:18 +00:00 |
|
PoYen, Chen
|
2532908699
|
Print num_splits conditionally
|
2024-06-11 14:34:45 +00:00 |
|
PoYen, Chen
|
1c531a0c13
|
Update license date
|
2024-06-11 14:29:49 +00:00 |
|
PoYen, Chen
|
9293f5448a
|
Enable non-split-kv blobs
|
2024-06-11 14:23:42 +00:00 |
|
PoYen, Chen
|
0fd7f85504
|
Use shorter template parameter name
|
2024-06-11 14:20:03 +00:00 |
|
PoYen, Chen
|
138b75bf12
|
Remove unused include directive
|
2024-06-11 14:18:24 +00:00 |
|
PoYen, Chen
|
16cc9eeef4
|
Fix unstable clang-format comment
|
2024-06-11 14:15:52 +00:00 |
|
PoYen, Chen
|
c9bbb7b142
|
Clearn up generate.py
|
2024-06-11 14:15:07 +00:00 |
|
PoYen, Chen
|
bb6804e315
|
Add constness to local variables
|
2024-06-11 14:10:35 +00:00 |
|
PoYen, Chen
|
31505a2a04
|
Remove more debug statements
|
2024-06-11 14:08:39 +00:00 |
|
PoYen, Chen
|
5efb80347e
|
Remove debug statements in example
|
2024-06-11 14:02:53 +00:00 |
|
PoYen, Chen
|
912a6cb2ea
|
Remove in-consistent comment
|
2024-06-11 13:56:44 +00:00 |
|
PoYen, Chen
|
95be5c2b9d
|
Remove no-longer used field
|
2024-06-11 13:46:13 +00:00 |
|
PoYen, Chen
|
893841d745
|
Undo vector size changes
|
2024-06-11 13:46:13 +00:00 |
|
PoYen, Chen
|
40c885f007
|
Fix wrong loop counter step logic
|
2024-06-11 13:46:13 +00:00 |
|
PoYen, Chen
|
c36cad2e6c
|
Fix wrong LDS indexing logics
|
2024-06-11 13:46:13 +00:00 |
|
PoYen, Chen
|
d74a1d6ed1
|
Fix split-kv combine kernel name
|
2024-06-11 13:46:13 +00:00 |
|
PoYen, Chen
|
f3e213c0c5
|
Reduce # of combine kernels
|
2024-06-11 13:46:13 +00:00 |
|
PoYen, Chen
|
180b726f97
|
Fix wrong kBlockSize used in policy
|
2024-06-11 13:46:13 +00:00 |
|
PoYen, Chen
|
238fde80a6
|
Fix o_acc memory error
|
2024-06-11 13:46:13 +00:00 |
|
PoYen, Chen
|
ffd2768000
|
Format codes
|
2024-06-11 13:46:13 +00:00 |
|
PoYen, Chen
|
18a7223b96
|
Fix wrong layout of LSE/LSEacc/Oacc
|
2024-06-11 13:46:13 +00:00 |
|
PoYen, Chen
|
064afc69d9
|
Replace sentinel value before storing
|
2024-06-11 13:46:13 +00:00 |
|
PoYen, Chen
|
5a6b8d8606
|
Clean-up code
|
2024-06-11 13:46:13 +00:00 |
|
Po-Yen, Chen
|
eac0f3cc47
|
Fix mismatched return type
|
2024-06-11 13:46:13 +00:00 |
|
PoYen, Chen
|
9ac2654b55
|
Add SplitKV combine kernel codegen logics
|
2024-06-11 13:46:13 +00:00 |
|
PoYen, Chen
|
cacce74f2c
|
Add SplitKV kernel codegen logics
|
2024-06-11 13:46:13 +00:00 |
|
PoYen, Chen
|
78b64d11c4
|
Generate fmha_fwd_splitkv()
|
2024-06-11 13:46:13 +00:00 |
|
PoYen, Chen
|
c928fefaae
|
Add num_splits option and dummy split-kv api method
|
2024-06-11 13:46:13 +00:00 |
|
Po Yen Chen
|
abc7e7ed30
|
Merge branch 'develop' into ck_tile/fa_train
|
2024-06-04 16:03:01 +08:00 |
|
dependabot[bot]
|
76827d82ca
|
Bump rocm-docs-core from 1.2.0 to 1.2.1 in /docs/sphinx (#1322)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.2.0 to 1.2.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.2.0...v1.2.1)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
|
2024-06-03 22:41:56 -07:00 |
|
danyao12
|
327074c3f8
|
fix error in WarpGemm
|
2024-06-04 11:42:33 +08:00 |
|