danyao12
87f73f30e8
Transpose -> transpose
2024-05-29 16:54:26 +08:00
danyao12
58f61716b5
CK_TILE_HOST_DEVICE in philox
2024-05-29 16:20:34 +08:00
danyao12
1c511b3e7d
update bwd kernel launch
2024-05-28 23:14:18 +08:00
danyao12
ba6437868b
Merge branch 'develop' into ck_tile/fa_train
2024-05-28 11:42:38 +08:00
carlushuang
5055b3bdcb
[CK_TILE] support group from cmdline ( #1295 )
...
* support cmdline seqlen decode
* silent print
* update readme
* update kernel launch 3d
* update tile partitioner
* fix spill for bf16
* modify based on comment
* modify payload_t
* fix bug for alibi mode
* fix alibi test err
* refactor kernel launch, support select timer
* add missing file
* remove useless code
* add some comments
2024-05-28 11:13:21 +08:00
Joseph Macaranas
02fa2c298b
Enable external CI pipeline triggers ( #1310 )
2024-05-23 18:21:34 -04:00
Illia Silin
ec2bae27ff
Split the gemm_multi_abd instances. ( #1306 )
...
* split the gemm_multi_abd instances
* update the dates
2024-05-23 09:17:02 -07:00
dependabot[bot]
06a9b72caf
Bump rocm-docs-core from 1.1.2 to 1.1.3 in /docs/sphinx ( #1308 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.1.2 to 1.1.3.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.2...v1.1.3 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-23 07:45:53 -07:00
danyao12
7ed2ca79ac
update generated filenames
2024-05-23 17:20:10 +08:00
danyao12
ff6f33d4f7
add bwd validation stream_config
2024-05-23 15:18:43 +08:00
Max Podkorytov
29e58d5b28
Make the library which generates CK instances for pytorch2 inductor's CK backend usage
...
Also bundle the CK library and include files with the pip package.
The package is pip-installable with
`pip install
git+https://github.com/tenpercent/composable_kernel@enable-pip `
(substitute the repo path and branch if necessary)
Testing:
`myenv/bin/python3 -m ck4inductor.universal_gemm.gen_instances`
(prints a list of instances)
`tree myenv/lib/python3.12/site-packages/ck4inductor`
(observe the list of sources along the installed package)
2024-05-22 13:44:22 -07:00
Bartłomiej Kocot
fd72380aeb
Optimize grouped conv bwd weight for small M and N ( #1303 )
...
* Optimize grouped conv bwd weight for small M and N
* Fixes
2024-05-22 21:01:01 +02:00
Illia Silin
7b027d5643
Select appropriate GPU targets for instances, tests, and examples. ( #1304 )
...
* set individual gpu targets for instances, examples, tests
* fix path to hip compiler
* fix path to hip compiler once more
* aggregate device macros in ck_tile config header
* fix the cmake logic for instances
* fix clang format
* add gfx900 and gfx906 to default set of targets
2024-05-22 11:45:27 -07:00
Rostyslav Geyyer
204da9c522
Move grouped conv fwd client examples ( #1299 )
...
* Move grouped conv fwd client examples
* Update existing examples
* Format
2024-05-21 09:52:41 -05:00
Illia Silin
06b891c5c2
aggregate device macros in ck_tile config header ( #1297 )
2024-05-20 08:34:45 -07:00
Illia Silin
1274861a9d
replace the ENV macro with CK_ENV ( #1296 )
2024-05-17 10:42:51 -07:00
dependabot[bot]
6637a810d0
Bump rocm-docs-core from 1.1.1 to 1.1.2 in /docs/sphinx ( #1293 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.1.1 to 1.1.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.1...v1.1.2 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-17 07:44:48 -07:00
rocking
aaa8dfdae9
Fix compile error ( #1292 )
...
error: no viable conversion from returned value of type '__half' to function return type 'fp16_hip_t' (aka '_Float16')
Co-authored-by: carlushuang <carlus.huang@amd.com >
2024-05-17 17:19:17 +08:00
Illia Silin
c44137838e
remove wrong use of nonexistent class members ( #1290 )
2024-05-15 08:08:17 -07:00
carlushuang
dd0dd13d4e
remove operator-deref ( #1291 )
2024-05-15 08:06:50 -07:00
danyao12
826a894335
support bwd alibi
2024-05-15 21:55:02 +08:00
jakpiase
3e3471d5d2
Add unit tests for grouped gemm two stage ( #1256 )
...
* add unit tests for grouped gemm two stage
* add reviewers suggestions
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2024-05-15 10:03:39 +02:00
danyao12
a84009f83b
bwd alibi
2024-05-13 10:39:44 +08:00
carlushuang
35f59c04e6
Merge remote-tracking branch 'origin/develop' into ck_tile/fa_train
2024-05-12 23:03:10 +00:00
carlushuang
bd9cd53885
now fwd/bwd can build
2024-05-12 22:33:22 +00:00
carlushuang
90700dbefa
[CK_TILE] support alibi ( #1269 )
...
* add alibi support
* fix code
* update code based on comment
* Support more hdim
* fix fp8 bias
* support seqlen_k=0 case
* remove unused printf
* fix format
---------
Co-authored-by: rocking <ChunYu.Lai@amd.com >
2024-05-11 10:43:56 +00:00
Illia Silin
7843a8a7fb
re-enable convnd_fwd_xdl_fp64 testing ( #1289 )
2024-05-10 22:48:28 -07:00
Illia Silin
566b6480a2
Code clean-up ( #1285 )
...
* code clean-up
* remove the profiling output samples
2024-05-10 09:41:39 -07:00
carlushuang
fcba889ef4
[CK_TILE] fix some rand number init ( #1287 )
...
* add random norm
* normalized default to 0/3
* change squant->auto
2024-05-10 09:03:39 -07:00
Bartłomiej Kocot
8346af9c68
Change output gemm type to AccDataType in two stage conv bwd wei ( #1283 )
2024-05-10 10:57:42 +02:00
danyao12
c26c99e55f
CMakeLists update
2024-05-10 12:09:33 +08:00
danyao12
15187df456
epilogue reuse
2024-05-10 10:57:53 +08:00
Adam Osewski
a0ae1c6133
Fix MakeArgument ( #1284 )
2024-05-09 09:42:41 -07:00
Adam Osewski
3c043cd10b
Add vector instruction coherency bits for gfx94 targets. ( #1268 )
2024-05-09 07:30:17 -07:00
danyao12
e1a21655ae
FA bwd
2024-05-09 17:08:08 +08:00
Illia Silin
fdbf8ccbd7
fix the output formatting ( #1282 )
2024-05-08 16:11:54 -07:00
Bartłomiej Kocot
0b6b5d1785
Add two stage grouped conv bwd weight kernel ( #1280 )
2024-05-08 09:53:24 +02:00
Illia Silin
bf42097646
Enable logging in CK with environment variable. ( #1278 )
...
* enable logging using environment variable
* update ck.hpp header
* fix typo
* fix clang format
* Update include/ck/utility/env.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
---------
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
2024-05-07 16:26:43 -07:00
carlushuang
851c3ed157
[CK_TILE] support alibi ( #1269 )
...
* add alibi support
* fix code
* update code based on comment
* Support more hdim
* fix fp8 bias
* support seqlen_k=0 case
* remove unused printf
* fix format
---------
Co-authored-by: rocking <ChunYu.Lai@amd.com >
2024-05-07 22:32:54 +08:00
Sam Wu
6d073d31bb
Add ROCm Doc team as codeowners for RTD yaml ( #1277 )
...
Also add component owners as codeowners for header directory
2024-05-06 10:07:39 -06:00
Illia Silin
08d51d9bc4
add missing vector header ( #1275 )
2024-05-02 11:27:59 -07:00
Illia Silin
7797f7c7a1
Downgrade minimum required python version to 3.6 ( #1274 )
2024-05-01 15:34:56 -07:00
Illia Silin
f0bf1e3125
[CI] Focus CI stages on MI200 nodes for resource optimization ( #1273 )
2024-05-01 10:07:14 -07:00
Rostyslav Geyyer
a2d0bdd5a9
Add an ignore ( #1270 )
2024-04-30 20:45:22 -07:00
Sam Wu
43579900a9
Update documentation requirements and configurations ( #1272 )
...
* Update documentation requirements
Set rocm-docs-core to v1.1.1
* Update RTD config
Set Python 3.10 for rocm-docs-core >= v1.0.0
2024-04-30 20:44:59 -07:00
Illia Silin
f6b3f4715d
[CI][Tests] Add a daily cron job to build CK instances for gfx9;gfx10;gfx11. ( #1271 )
...
* add a daily build for instances for gfx9;gfx10;gfx11
* fix jenkins logic for instances only build
* fix the path for instance_only build
* reduce the number of build threads to 32
2024-04-30 14:44:30 -07:00
Adam Osewski
0f7e8ec485
Fix example CMakeLists.txt ( #1267 )
...
Add proper dependency target.
2024-04-30 08:28:19 -07:00
Rostyslav Geyyer
6ced3c12ff
Mark unneeded instances as "getting deprecated" ( #1265 )
...
* Add a flag
* Add flag check and messages
---------
Co-authored-by: root <root@aus-g7-rogeyyer.amd.com >
2024-04-29 12:00:55 -07:00
danyao12
bbd2e1eae3
FA fwd dropout
2024-04-29 14:13:00 +08:00
Haocong WANG
764164b488
[GEMM] UniversalGemm update ( #1262 )
...
* Add bf16 instances
* Add bf16 gemm universal example
* tempsave
* Add guard to navi compilation
* workground on a specific mixed gemm instance ( bring back it when compiler fix upload)
* fix formatting condition statement issue
* solve conflict
---------
Co-authored-by: Jun Liu <Liu.Jun@amd.com >
2024-04-26 12:56:07 -05:00