Bartłomiej Kocot
6935a2481c
Add read_first_lane function for int64 ( #1347 )
...
[ROCm/composable_kernel commit: 8faec23cb4 ]
2024-06-18 15:05:30 -05:00
jakpiase
92853de60e
Switch to universal gemm in grouped gemm tile loop ( #1335 )
...
* switch to universal gemm in grouped gemm tile loop
* minor fixes
* add reviewers comments
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
[ROCm/composable_kernel commit: e2d139201b ]
2024-06-18 09:01:49 -05:00
Bartłomiej Kocot
c0eda96fec
Fix continous dim selection in contraction ( #1336 )
...
* Fix continous dim selection in contraction
* Fixes
[ROCm/composable_kernel commit: 933951ed48 ]
2024-06-18 10:26:49 +02:00
carlushuang
05adcc7f64
[CK_TILE][FA] using pk f16_f32 ( #1343 )
...
* [CK_TILE][FA] using pk f16_f32
* correct a error
[ROCm/composable_kernel commit: 17ed368f58 ]
2024-06-17 17:16:46 +08:00
zjing14
651ce5c272
disabled lds direct load inline asm ( #1331 )
...
[ROCm/composable_kernel commit: e02103168a ]
2024-06-16 20:33:47 -05:00
Bartłomiej Kocot
5728b06e64
Support large tensors in grouped conv fwd ( #1332 )
...
* Support large tensors in grouped conv fwd
* Multi ABD fixes
* Fix calculate element space size
[ROCm/composable_kernel commit: dc1e9c5df9 ]
2024-06-14 09:53:03 -05:00
Qianfeng
9b0d87fe9a
Fix to the using of static_for in amd_buffer_addressing.hpp ( #1337 )
...
* Add insert_dummy_dep_per_dword over-loading for length 64
* Fix insert_dummy_dep_per_dword and remove over-loading for length 64
* Remove blank lines
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
[ROCm/composable_kernel commit: 37a347e380 ]
2024-06-13 16:12:20 +08:00
Rostyslav Geyyer
2832eb1444
Add instances for grouped conv fwd 3d with ConvScale for fp8@bf8->fp8 ( #1325 )
...
* Add fp8 bf8 conv example
* Add instances
* Add client example
* Add random scale values
* Format
[ROCm/composable_kernel commit: acda4c5a3c ]
2024-06-12 14:41:56 -05:00
Bartłomiej Kocot
f1600a0db5
Fix nhwgc f16 wmma instances ( #1328 )
...
[ROCm/composable_kernel commit: 5fc1bee4c5 ]
2024-06-11 09:52:38 +02:00
Rostyslav Geyyer
9416b16080
Add a convinvscale op, related instances and examples ( #1307 )
...
* Update the element op
* Add an example
* Add instances
* Add a client example
* make sure new instances only build on gfx9
* Update element op and its handling
* Format
* Update instances to take element op as an argument
* Update examples to use random scale values
* Format
* Update client example with random scales
* Format
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: ce66277a76 ]
2024-06-10 14:48:49 -05:00
dependabot[bot]
e9023db1ea
Bump rocm-docs-core from 1.3.0 to 1.4.0 in /docs/sphinx ( #1327 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.3.0 to 1.4.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.3.0...v1.4.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: 8f5690c4bb ]
2024-06-06 22:38:26 -07:00
Bartłomiej Kocot
4716f8f70b
Integrate universal gemm with conv forward ( #1320 )
...
* Integrate universal gemm with conv fwd
* Fix conv fwd wmma test
* Fix instances
* Remove direct load check
[ROCm/composable_kernel commit: ac58cc5d1d ]
2024-06-05 13:01:29 -05:00
dependabot[bot]
9e83275bcc
Bump rocm-docs-core from 1.2.1 to 1.3.0 in /docs/sphinx ( #1324 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.2.1 to 1.3.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.2.1...v1.3.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: ba82beb9bf ]
2024-06-05 07:36:39 -07:00
Rostyslav Geyyer
692ae331ca
Add a scale op, related instances and examples ( #1242 )
...
* Add a scale op
* Update the element op
* Add instances
* Add an example
* Add a client example
* Add a flag check
* Revert flag check addition
* Fix flag check
* Update d strides in example
* Update d strides in client example
* Apply suggestions from code review
Update copyright header
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Move the example
* Move the client example
* Update element op
* Update example with the new element op
* Add scalar layout
* Update example
* Update kernel for scalar Ds
* Revert kernel changes
* Update element op
* Update example to use scales' pointers
* Format
* Update instances
* Update client example
* Move element op to unary elements
* Update element op to work with values instead of pointers
* Update instances to take element op as an argument
* Update examples to use random scale values
---------
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
[ROCm/composable_kernel commit: cb0645bedc ]
2024-06-04 19:28:15 -05:00
Dan Yao
26840b623a
CK Tile FA Training kernels ( #1286 )
...
* FA fwd dropout
* FA bwd
* epilogue reuse
* CMakeLists update
* [CK_TILE] support alibi (#1269 )
* add alibi support
* fix code
* update code based on comment
* Support more hdim
* fix fp8 bias
* support seqlen_k=0 case
* remove unused printf
* fix format
---------
Co-authored-by: rocking <ChunYu.Lai@amd.com >
* now fwd/bwd can build
* bwd alibi
* add bwd validation stream_config
* update generated filenames
* update bwd kernel launch
* CK_TILE_HOST_DEVICE in philox
* Transpose -> transpose
* format
* format
* format
* Generate the instance for FA required
* format
* fix error in WarpGemm
---------
Co-authored-by: danyao12 <danyao12>
Co-authored-by: carlushuang <carlus.huang@amd.com >
Co-authored-by: rocking <ChunYu.Lai@amd.com >
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
Co-authored-by: Jing Zhang <jizhan@amd.com >
[ROCm/composable_kernel commit: 2cab8d39e3 ]
2024-06-04 13:12:45 -05:00
dependabot[bot]
caad92f304
Bump rocm-docs-core from 1.2.0 to 1.2.1 in /docs/sphinx ( #1322 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.2.0 to 1.2.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.2.0...v1.2.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: 76827d82ca ]
2024-06-03 22:41:56 -07:00
Illia Silin
9298172793
disable the hipTensor test by default, only run once daily ( #1321 )
...
[ROCm/composable_kernel commit: 3fa7e2a6c4 ]
2024-06-03 14:07:30 -07:00
zjing14
9227a76f8e
Post-merge fix of PR 1300 ( #1313 )
...
* add f8 gemm with multiD for both row/col wise
* change compute_type to fp8
* changed tuning parameters in the example
* add rcr example
* post-merge fix
* fix
* reduce init range
[ROCm/composable_kernel commit: 6fb1f4e03f ]
2024-05-31 22:46:41 -07:00
Illia Silin
df89e0892f
Build CK library for all supported targets. ( #1312 )
...
* test library build for all supported targets
* increase the number of threads to build lib in CI to 64
[ROCm/composable_kernel commit: 34f3dfdd61 ]
2024-05-28 12:36:06 -07:00
dependabot[bot]
6a86bb94bf
Bump rocm-docs-core from 1.1.3 to 1.2.0 in /docs/sphinx ( #1311 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.1.3 to 1.2.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.3...v1.2.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: 66de8a02ba ]
2024-05-28 11:36:09 -07:00
zjing14
96356d2daf
add f8 gemm multiD with both row/col wise scale ( #1300 )
...
* add f8 gemm with multiD for both row/col wise
* change compute_type to fp8
* changed tuning parameters in the example
* add rcr example
[ROCm/composable_kernel commit: 80db62f08d ]
2024-05-28 12:04:22 -05:00
carlushuang
7ff08f6a52
[CK_TILE] support group from cmdline ( #1295 )
...
* support cmdline seqlen decode
* silent print
* update readme
* update kernel launch 3d
* update tile partitioner
* fix spill for bf16
* modify based on comment
* modify payload_t
* fix bug for alibi mode
* fix alibi test err
* refactor kernel launch, support select timer
* add missing file
* remove useless code
* add some comments
[ROCm/composable_kernel commit: 5055b3bdcb ]
2024-05-28 11:13:21 +08:00
Joseph Macaranas
5860f8a97d
Enable external CI pipeline triggers ( #1310 )
...
[ROCm/composable_kernel commit: 02fa2c298b ]
2024-05-23 18:21:34 -04:00
Illia Silin
c6b1a8b2e9
Split the gemm_multi_abd instances. ( #1306 )
...
* split the gemm_multi_abd instances
* update the dates
[ROCm/composable_kernel commit: ec2bae27ff ]
2024-05-23 09:17:02 -07:00
dependabot[bot]
8c40671e18
Bump rocm-docs-core from 1.1.2 to 1.1.3 in /docs/sphinx ( #1308 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.1.2 to 1.1.3.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.2...v1.1.3 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: 06a9b72caf ]
2024-05-23 07:45:53 -07:00
Max Podkorytov
c5c8419a01
Make the library which generates CK instances for pytorch2 inductor's CK backend usage
...
Also bundle the CK library and include files with the pip package.
The package is pip-installable with
`pip install
git+https://github.com/tenpercent/composable_kernel@enable-pip `
(substitute the repo path and branch if necessary)
Testing:
`myenv/bin/python3 -m ck4inductor.universal_gemm.gen_instances`
(prints a list of instances)
`tree myenv/lib/python3.12/site-packages/ck4inductor`
(observe the list of sources along the installed package)
[ROCm/composable_kernel commit: 29e58d5b28 ]
2024-05-22 13:44:22 -07:00
Bartłomiej Kocot
c6431f6c07
Optimize grouped conv bwd weight for small M and N ( #1303 )
...
* Optimize grouped conv bwd weight for small M and N
* Fixes
[ROCm/composable_kernel commit: fd72380aeb ]
2024-05-22 21:01:01 +02:00
Illia Silin
6cf9f7f72c
Select appropriate GPU targets for instances, tests, and examples. ( #1304 )
...
* set individual gpu targets for instances, examples, tests
* fix path to hip compiler
* fix path to hip compiler once more
* aggregate device macros in ck_tile config header
* fix the cmake logic for instances
* fix clang format
* add gfx900 and gfx906 to default set of targets
[ROCm/composable_kernel commit: 7b027d5643 ]
2024-05-22 11:45:27 -07:00
Rostyslav Geyyer
c16cff1498
Move grouped conv fwd client examples ( #1299 )
...
* Move grouped conv fwd client examples
* Update existing examples
* Format
[ROCm/composable_kernel commit: 204da9c522 ]
2024-05-21 09:52:41 -05:00
Illia Silin
b63dc7b530
aggregate device macros in ck_tile config header ( #1297 )
...
[ROCm/composable_kernel commit: 06b891c5c2 ]
2024-05-20 08:34:45 -07:00
Illia Silin
2026ce49e7
replace the ENV macro with CK_ENV ( #1296 )
...
[ROCm/composable_kernel commit: 1274861a9d ]
2024-05-17 10:42:51 -07:00
dependabot[bot]
506dc5e284
Bump rocm-docs-core from 1.1.1 to 1.1.2 in /docs/sphinx ( #1293 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.1.1 to 1.1.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.1...v1.1.2 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: 6637a810d0 ]
2024-05-17 07:44:48 -07:00
rocking
545e4e9e77
Fix compile error ( #1292 )
...
error: no viable conversion from returned value of type '__half' to function return type 'fp16_hip_t' (aka '_Float16')
Co-authored-by: carlushuang <carlus.huang@amd.com >
[ROCm/composable_kernel commit: aaa8dfdae9 ]
2024-05-17 17:19:17 +08:00
Illia Silin
6a57fe4ad4
remove wrong use of nonexistent class members ( #1290 )
...
[ROCm/composable_kernel commit: c44137838e ]
2024-05-15 08:08:17 -07:00
carlushuang
f16be4051d
remove operator-deref ( #1291 )
...
[ROCm/composable_kernel commit: dd0dd13d4e ]
2024-05-15 08:06:50 -07:00
jakpiase
c63db2b2ab
Add unit tests for grouped gemm two stage ( #1256 )
...
* add unit tests for grouped gemm two stage
* add reviewers suggestions
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
[ROCm/composable_kernel commit: 3e3471d5d2 ]
2024-05-15 10:03:39 +02:00
Illia Silin
bfabd9aa9a
re-enable convnd_fwd_xdl_fp64 testing ( #1289 )
...
[ROCm/composable_kernel commit: 7843a8a7fb ]
2024-05-10 22:48:28 -07:00
Illia Silin
a90f0099fc
Code clean-up ( #1285 )
...
* code clean-up
* remove the profiling output samples
[ROCm/composable_kernel commit: 566b6480a2 ]
2024-05-10 09:41:39 -07:00
carlushuang
866c3302ad
[CK_TILE] fix some rand number init ( #1287 )
...
* add random norm
* normalized default to 0/3
* change squant->auto
[ROCm/composable_kernel commit: fcba889ef4 ]
2024-05-10 09:03:39 -07:00
Bartłomiej Kocot
b7ee312021
Change output gemm type to AccDataType in two stage conv bwd wei ( #1283 )
...
[ROCm/composable_kernel commit: 8346af9c68 ]
2024-05-10 10:57:42 +02:00
Adam Osewski
84627bf589
Fix MakeArgument ( #1284 )
...
[ROCm/composable_kernel commit: a0ae1c6133 ]
2024-05-09 09:42:41 -07:00
Adam Osewski
d395dbb19f
Add vector instruction coherency bits for gfx94 targets. ( #1268 )
...
[ROCm/composable_kernel commit: 3c043cd10b ]
2024-05-09 07:30:17 -07:00
Illia Silin
6b5aca8f3c
fix the output formatting ( #1282 )
...
[ROCm/composable_kernel commit: fdbf8ccbd7 ]
2024-05-08 16:11:54 -07:00
Bartłomiej Kocot
68b2757f11
Add two stage grouped conv bwd weight kernel ( #1280 )
...
[ROCm/composable_kernel commit: 0b6b5d1785 ]
2024-05-08 09:53:24 +02:00
Illia Silin
b62c21c3b5
Enable logging in CK with environment variable. ( #1278 )
...
* enable logging using environment variable
* update ck.hpp header
* fix typo
* fix clang format
* Update include/ck/utility/env.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
---------
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
[ROCm/composable_kernel commit: bf42097646 ]
2024-05-07 16:26:43 -07:00
carlushuang
7bfe56e5ca
[CK_TILE] support alibi ( #1269 )
...
* add alibi support
* fix code
* update code based on comment
* Support more hdim
* fix fp8 bias
* support seqlen_k=0 case
* remove unused printf
* fix format
---------
Co-authored-by: rocking <ChunYu.Lai@amd.com >
[ROCm/composable_kernel commit: 851c3ed157 ]
2024-05-07 22:32:54 +08:00
Sam Wu
94a85a36ad
Add ROCm Doc team as codeowners for RTD yaml ( #1277 )
...
Also add component owners as codeowners for header directory
[ROCm/composable_kernel commit: 6d073d31bb ]
2024-05-06 10:07:39 -06:00
Illia Silin
48872cec09
add missing vector header ( #1275 )
...
[ROCm/composable_kernel commit: 08d51d9bc4 ]
2024-05-02 11:27:59 -07:00
Illia Silin
d89deae29c
Downgrade minimum required python version to 3.6 ( #1274 )
...
[ROCm/composable_kernel commit: 7797f7c7a1 ]
2024-05-01 15:34:56 -07:00
Illia Silin
dd79e7371c
[CI] Focus CI stages on MI200 nodes for resource optimization ( #1273 )
...
[ROCm/composable_kernel commit: f0bf1e3125 ]
2024-05-01 10:07:14 -07:00