PoYen, Chen
31505a2a04
Remove more debug statements
2024-06-11 14:08:39 +00:00
PoYen, Chen
5efb80347e
Remove debug statements in example
2024-06-11 14:02:53 +00:00
PoYen, Chen
912a6cb2ea
Remove in-consistent comment
2024-06-11 13:56:44 +00:00
PoYen, Chen
95be5c2b9d
Remove no-longer used field
2024-06-11 13:46:13 +00:00
PoYen, Chen
893841d745
Undo vector size changes
2024-06-11 13:46:13 +00:00
PoYen, Chen
40c885f007
Fix wrong loop counter step logic
2024-06-11 13:46:13 +00:00
PoYen, Chen
c36cad2e6c
Fix wrong LDS indexing logics
2024-06-11 13:46:13 +00:00
PoYen, Chen
d74a1d6ed1
Fix split-kv combine kernel name
2024-06-11 13:46:13 +00:00
PoYen, Chen
f3e213c0c5
Reduce # of combine kernels
2024-06-11 13:46:13 +00:00
PoYen, Chen
180b726f97
Fix wrong kBlockSize used in policy
2024-06-11 13:46:13 +00:00
PoYen, Chen
238fde80a6
Fix o_acc memory error
2024-06-11 13:46:13 +00:00
PoYen, Chen
ffd2768000
Format codes
2024-06-11 13:46:13 +00:00
PoYen, Chen
18a7223b96
Fix wrong layout of LSE/LSEacc/Oacc
2024-06-11 13:46:13 +00:00
PoYen, Chen
064afc69d9
Replace sentinel value before storing
2024-06-11 13:46:13 +00:00
PoYen, Chen
5a6b8d8606
Clean-up code
2024-06-11 13:46:13 +00:00
Po-Yen, Chen
eac0f3cc47
Fix mismatched return type
2024-06-11 13:46:13 +00:00
PoYen, Chen
9ac2654b55
Add SplitKV combine kernel codegen logics
2024-06-11 13:46:13 +00:00
PoYen, Chen
cacce74f2c
Add SplitKV kernel codegen logics
2024-06-11 13:46:13 +00:00
PoYen, Chen
78b64d11c4
Generate fmha_fwd_splitkv()
2024-06-11 13:46:13 +00:00
PoYen, Chen
c928fefaae
Add num_splits option and dummy split-kv api method
2024-06-11 13:46:13 +00:00
Po Yen Chen
abc7e7ed30
Merge branch 'develop' into ck_tile/fa_train
2024-06-04 16:03:01 +08:00
dependabot[bot]
76827d82ca
Bump rocm-docs-core from 1.2.0 to 1.2.1 in /docs/sphinx ( #1322 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.2.0 to 1.2.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.2.0...v1.2.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-03 22:41:56 -07:00
danyao12
327074c3f8
fix error in WarpGemm
2024-06-04 11:42:33 +08:00
danyao12
bdd4a87199
format
2024-06-04 08:26:53 +08:00
Illia Silin
3fa7e2a6c4
disable the hipTensor test by default, only run once daily ( #1321 )
2024-06-03 14:07:30 -07:00
rocking
9ceff3a5c8
Generate the instance for FA required
2024-06-03 20:03:16 +00:00
zjing14
6fb1f4e03f
Post-merge fix of PR 1300 ( #1313 )
...
* add f8 gemm with multiD for both row/col wise
* change compute_type to fp8
* changed tuning parameters in the example
* add rcr example
* post-merge fix
* fix
* reduce init range
2024-05-31 22:46:41 -07:00
root
c70662a92e
format
2024-06-01 01:42:45 +00:00
Jing Zhang
09e9f10f97
format
2024-05-31 13:59:47 +00:00
root
60b328d597
Merge branch 'ck_tile/fa_train' of github.com:ROCm/composable_kernel into ck_tile/fa_train
2024-05-31 13:51:37 +00:00
Jing Zhang
0d7f71779b
format
2024-05-31 13:51:28 +00:00
Po Yen Chen
ff31c6a70c
Merge branch 'develop' into ck_tile/fa_train
2024-05-31 15:52:47 +08:00
danyao12
87f73f30e8
Transpose -> transpose
2024-05-29 16:54:26 +08:00
danyao12
58f61716b5
CK_TILE_HOST_DEVICE in philox
2024-05-29 16:20:34 +08:00
Illia Silin
34f3dfdd61
Build CK library for all supported targets. ( #1312 )
...
* test library build for all supported targets
* increase the number of threads to build lib in CI to 64
2024-05-28 12:36:06 -07:00
dependabot[bot]
66de8a02ba
Bump rocm-docs-core from 1.1.3 to 1.2.0 in /docs/sphinx ( #1311 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.1.3 to 1.2.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.3...v1.2.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-28 11:36:09 -07:00
zjing14
80db62f08d
add f8 gemm multiD with both row/col wise scale ( #1300 )
...
* add f8 gemm with multiD for both row/col wise
* change compute_type to fp8
* changed tuning parameters in the example
* add rcr example
2024-05-28 12:04:22 -05:00
danyao12
1c511b3e7d
update bwd kernel launch
2024-05-28 23:14:18 +08:00
danyao12
ba6437868b
Merge branch 'develop' into ck_tile/fa_train
2024-05-28 11:42:38 +08:00
carlushuang
5055b3bdcb
[CK_TILE] support group from cmdline ( #1295 )
...
* support cmdline seqlen decode
* silent print
* update readme
* update kernel launch 3d
* update tile partitioner
* fix spill for bf16
* modify based on comment
* modify payload_t
* fix bug for alibi mode
* fix alibi test err
* refactor kernel launch, support select timer
* add missing file
* remove useless code
* add some comments
2024-05-28 11:13:21 +08:00
Joseph Macaranas
02fa2c298b
Enable external CI pipeline triggers ( #1310 )
2024-05-23 18:21:34 -04:00
Illia Silin
ec2bae27ff
Split the gemm_multi_abd instances. ( #1306 )
...
* split the gemm_multi_abd instances
* update the dates
2024-05-23 09:17:02 -07:00
dependabot[bot]
06a9b72caf
Bump rocm-docs-core from 1.1.2 to 1.1.3 in /docs/sphinx ( #1308 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.1.2 to 1.1.3.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.2...v1.1.3 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-23 07:45:53 -07:00
danyao12
7ed2ca79ac
update generated filenames
2024-05-23 17:20:10 +08:00
danyao12
ff6f33d4f7
add bwd validation stream_config
2024-05-23 15:18:43 +08:00
Max Podkorytov
29e58d5b28
Make the library which generates CK instances for pytorch2 inductor's CK backend usage
...
Also bundle the CK library and include files with the pip package.
The package is pip-installable with
`pip install
git+https://github.com/tenpercent/composable_kernel@enable-pip `
(substitute the repo path and branch if necessary)
Testing:
`myenv/bin/python3 -m ck4inductor.universal_gemm.gen_instances`
(prints a list of instances)
`tree myenv/lib/python3.12/site-packages/ck4inductor`
(observe the list of sources along the installed package)
2024-05-22 13:44:22 -07:00
Bartłomiej Kocot
fd72380aeb
Optimize grouped conv bwd weight for small M and N ( #1303 )
...
* Optimize grouped conv bwd weight for small M and N
* Fixes
2024-05-22 21:01:01 +02:00
Illia Silin
7b027d5643
Select appropriate GPU targets for instances, tests, and examples. ( #1304 )
...
* set individual gpu targets for instances, examples, tests
* fix path to hip compiler
* fix path to hip compiler once more
* aggregate device macros in ck_tile config header
* fix the cmake logic for instances
* fix clang format
* add gfx900 and gfx906 to default set of targets
2024-05-22 11:45:27 -07:00
Rostyslav Geyyer
204da9c522
Move grouped conv fwd client examples ( #1299 )
...
* Move grouped conv fwd client examples
* Update existing examples
* Format
2024-05-21 09:52:41 -05:00
Illia Silin
06b891c5c2
aggregate device macros in ck_tile config header ( #1297 )
2024-05-20 08:34:45 -07:00