PoYen, Chen
4f8cef36bc
Fix example output format
2024-06-11 18:21:31 +00:00
PoYen, Chen
5c752a02b7
Fix wrong pipeline args for fp8
2024-06-11 14:55:45 +00:00
PoYen, Chen
eaca81945e
Remove unnessary tile size for fp8
2024-06-11 14:42:32 +00:00
PoYen, Chen
8eb6e451f2
Undo disabling data types
2024-06-11 14:37:18 +00:00
PoYen, Chen
2532908699
Print num_splits conditionally
2024-06-11 14:34:45 +00:00
PoYen, Chen
1c531a0c13
Update license date
2024-06-11 14:29:49 +00:00
PoYen, Chen
9293f5448a
Enable non-split-kv blobs
2024-06-11 14:23:42 +00:00
PoYen, Chen
0fd7f85504
Use shorter template parameter name
2024-06-11 14:20:03 +00:00
PoYen, Chen
138b75bf12
Remove unused include directive
2024-06-11 14:18:24 +00:00
PoYen, Chen
16cc9eeef4
Fix unstable clang-format comment
2024-06-11 14:15:52 +00:00
PoYen, Chen
c9bbb7b142
Clearn up generate.py
2024-06-11 14:15:07 +00:00
PoYen, Chen
bb6804e315
Add constness to local variables
2024-06-11 14:10:35 +00:00
PoYen, Chen
31505a2a04
Remove more debug statements
2024-06-11 14:08:39 +00:00
PoYen, Chen
5efb80347e
Remove debug statements in example
2024-06-11 14:02:53 +00:00
PoYen, Chen
912a6cb2ea
Remove in-consistent comment
2024-06-11 13:56:44 +00:00
PoYen, Chen
95be5c2b9d
Remove no-longer used field
2024-06-11 13:46:13 +00:00
PoYen, Chen
893841d745
Undo vector size changes
2024-06-11 13:46:13 +00:00
PoYen, Chen
40c885f007
Fix wrong loop counter step logic
2024-06-11 13:46:13 +00:00
PoYen, Chen
c36cad2e6c
Fix wrong LDS indexing logics
2024-06-11 13:46:13 +00:00
PoYen, Chen
d74a1d6ed1
Fix split-kv combine kernel name
2024-06-11 13:46:13 +00:00
PoYen, Chen
f3e213c0c5
Reduce # of combine kernels
2024-06-11 13:46:13 +00:00
PoYen, Chen
180b726f97
Fix wrong kBlockSize used in policy
2024-06-11 13:46:13 +00:00
PoYen, Chen
238fde80a6
Fix o_acc memory error
2024-06-11 13:46:13 +00:00
PoYen, Chen
ffd2768000
Format codes
2024-06-11 13:46:13 +00:00
PoYen, Chen
18a7223b96
Fix wrong layout of LSE/LSEacc/Oacc
2024-06-11 13:46:13 +00:00
PoYen, Chen
064afc69d9
Replace sentinel value before storing
2024-06-11 13:46:13 +00:00
PoYen, Chen
5a6b8d8606
Clean-up code
2024-06-11 13:46:13 +00:00
Po-Yen, Chen
eac0f3cc47
Fix mismatched return type
2024-06-11 13:46:13 +00:00
PoYen, Chen
9ac2654b55
Add SplitKV combine kernel codegen logics
2024-06-11 13:46:13 +00:00
PoYen, Chen
cacce74f2c
Add SplitKV kernel codegen logics
2024-06-11 13:46:13 +00:00
PoYen, Chen
78b64d11c4
Generate fmha_fwd_splitkv()
2024-06-11 13:46:13 +00:00
PoYen, Chen
c928fefaae
Add num_splits option and dummy split-kv api method
2024-06-11 13:46:13 +00:00
Po Yen Chen
abc7e7ed30
Merge branch 'develop' into ck_tile/fa_train
2024-06-04 16:03:01 +08:00
dependabot[bot]
76827d82ca
Bump rocm-docs-core from 1.2.0 to 1.2.1 in /docs/sphinx ( #1322 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.2.0 to 1.2.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.2.0...v1.2.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-03 22:41:56 -07:00
danyao12
327074c3f8
fix error in WarpGemm
2024-06-04 11:42:33 +08:00
danyao12
bdd4a87199
format
2024-06-04 08:26:53 +08:00
Illia Silin
3fa7e2a6c4
disable the hipTensor test by default, only run once daily ( #1321 )
2024-06-03 14:07:30 -07:00
rocking
9ceff3a5c8
Generate the instance for FA required
2024-06-03 20:03:16 +00:00
zjing14
6fb1f4e03f
Post-merge fix of PR 1300 ( #1313 )
...
* add f8 gemm with multiD for both row/col wise
* change compute_type to fp8
* changed tuning parameters in the example
* add rcr example
* post-merge fix
* fix
* reduce init range
2024-05-31 22:46:41 -07:00
root
c70662a92e
format
2024-06-01 01:42:45 +00:00
Jing Zhang
09e9f10f97
format
2024-05-31 13:59:47 +00:00
root
60b328d597
Merge branch 'ck_tile/fa_train' of github.com:ROCm/composable_kernel into ck_tile/fa_train
2024-05-31 13:51:37 +00:00
Jing Zhang
0d7f71779b
format
2024-05-31 13:51:28 +00:00
Po Yen Chen
ff31c6a70c
Merge branch 'develop' into ck_tile/fa_train
2024-05-31 15:52:47 +08:00
danyao12
87f73f30e8
Transpose -> transpose
2024-05-29 16:54:26 +08:00
danyao12
58f61716b5
CK_TILE_HOST_DEVICE in philox
2024-05-29 16:20:34 +08:00
Illia Silin
34f3dfdd61
Build CK library for all supported targets. ( #1312 )
...
* test library build for all supported targets
* increase the number of threads to build lib in CI to 64
2024-05-28 12:36:06 -07:00
dependabot[bot]
66de8a02ba
Bump rocm-docs-core from 1.1.3 to 1.2.0 in /docs/sphinx ( #1311 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.1.3 to 1.2.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.3...v1.2.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-28 11:36:09 -07:00
zjing14
80db62f08d
add f8 gemm multiD with both row/col wise scale ( #1300 )
...
* add f8 gemm with multiD for both row/col wise
* change compute_type to fp8
* changed tuning parameters in the example
* add rcr example
2024-05-28 12:04:22 -05:00
danyao12
1c511b3e7d
update bwd kernel launch
2024-05-28 23:14:18 +08:00