rocking
bf33df4e6e
layernorm2d forward ( #1339 )
...
* Add layernorm2d forward
* Refind file path
* clang format
* Exclude ck_tile op from all
* use add_executable instead
* refactor layernorm2d_fwd example
---------
Co-authored-by: carlushuang <carlus.huang@amd.com >
[ROCm/composable_kernel commit: cb13839425 ]
2024-06-24 08:45:52 +08:00
Andriy Roshchenko
31abec679c
Add instances of grouped convolution 3d forward with a ConvScale element-wise op for bf8@bf8->fp8 ( #1326 )
...
We are adding more instances of grouped convolution 3d forward with a ConvScale element-wise operation.
This commit handles bf8@bf8->fp8 data types combination.
* Included an example.
* Added instances.
* Added a client example.
---------
Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com >
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
[ROCm/composable_kernel commit: 05b10e0e5a ]
2024-06-21 19:02:57 -06:00
carlushuang
723dd9813e
WA for rocm-6.2+ s constrait for buffer resource ( #1346 )
...
* WA for rocm-6.2+ s constrait for buffer resource
* add missing memory clobber
[ROCm/composable_kernel commit: fa129c1a5d ]
2024-06-21 11:00:13 -05:00
Bartłomiej Kocot
cc0dd8a45e
Fix cmake warnings ( #1342 )
...
* Cmake add -Wno-nvcc-compt
* Remove template without initialization list
* dpp remove template without init list
* Fixes
[ROCm/composable_kernel commit: 510325a468 ]
2024-06-21 09:47:58 +02:00
Dan Yao
c30ad40dfb
Fix FA bwd alibi+causal NaN errors ( #1352 )
...
* fix bwd alibi nan error
* fix datatype
---------
Co-authored-by: danyao12 <danyao12>
[ROCm/composable_kernel commit: 1da802bdf2 ]
2024-06-20 09:50:53 -05:00
ThruptiRajLakshmanaGowda
428cefd1b5
Adding Missed Activation Functions for Grouped 2D/3D Convolutions ( #1348 )
...
* Initial Push
* First Push
* Fixed Clang format
* Resolve merge conflict
* Addressed review comments
* Addressed review comments
* Addressed review comments
[ROCm/composable_kernel commit: 0162a5f6ba ]
2024-06-20 09:24:54 -05:00
Qianfeng
c70758ca61
Fix in dropout lambda to avoid the compiling issue on some docker/compiler envs ( #1350 )
...
[ROCm/composable_kernel commit: e3f44659cf ]
2024-06-20 11:36:42 +08:00
zjing14
333c31bb42
Remove gfx900 and gfx906 from default target device to reduce package size ( #1351 )
...
[ROCm/composable_kernel commit: 8db331a511 ]
2024-06-19 11:47:18 -07:00
Qianfeng
44ec3b26d5
Hacking ck_tile fmha Dropout facility ( #1344 )
...
* Add NullBlockDropout to be used when kHasDropout is false
* Change to BlockDropout::Run() for forward to reduce conditional checkings
* Re-format files
---------
Co-authored-by: PoYen, Chen <PoYen.Chen@amd.com >
[ROCm/composable_kernel commit: 1973903f49 ]
2024-06-19 10:37:22 +08:00
Bartłomiej Kocot
e1c3bf298d
Add read_first_lane function for int64 ( #1347 )
...
[ROCm/composable_kernel commit: 8faec23cb4 ]
2024-06-18 15:05:30 -05:00
jakpiase
163a866a5b
Switch to universal gemm in grouped gemm tile loop ( #1335 )
...
* switch to universal gemm in grouped gemm tile loop
* minor fixes
* add reviewers comments
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
[ROCm/composable_kernel commit: e2d139201b ]
2024-06-18 09:01:49 -05:00
Bartłomiej Kocot
856b54e58b
Fix continous dim selection in contraction ( #1336 )
...
* Fix continous dim selection in contraction
* Fixes
[ROCm/composable_kernel commit: 933951ed48 ]
2024-06-18 10:26:49 +02:00
carlushuang
447beaec1e
[CK_TILE][FA] using pk f16_f32 ( #1343 )
...
* [CK_TILE][FA] using pk f16_f32
* correct a error
[ROCm/composable_kernel commit: 17ed368f58 ]
2024-06-17 17:16:46 +08:00
zjing14
4847f3beb4
disabled lds direct load inline asm ( #1331 )
...
[ROCm/composable_kernel commit: e02103168a ]
2024-06-16 20:33:47 -05:00
Bartłomiej Kocot
d413c30ff4
Support large tensors in grouped conv fwd ( #1332 )
...
* Support large tensors in grouped conv fwd
* Multi ABD fixes
* Fix calculate element space size
[ROCm/composable_kernel commit: dc1e9c5df9 ]
2024-06-14 09:53:03 -05:00
Qianfeng
b8de94c07a
Fix to the using of static_for in amd_buffer_addressing.hpp ( #1337 )
...
* Add insert_dummy_dep_per_dword over-loading for length 64
* Fix insert_dummy_dep_per_dword and remove over-loading for length 64
* Remove blank lines
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
[ROCm/composable_kernel commit: 37a347e380 ]
2024-06-13 16:12:20 +08:00
Rostyslav Geyyer
94c0cadc92
Add instances for grouped conv fwd 3d with ConvScale for fp8@bf8->fp8 ( #1325 )
...
* Add fp8 bf8 conv example
* Add instances
* Add client example
* Add random scale values
* Format
[ROCm/composable_kernel commit: acda4c5a3c ]
2024-06-12 14:41:56 -05:00
Bartłomiej Kocot
f95b574ab1
Fix nhwgc f16 wmma instances ( #1328 )
...
[ROCm/composable_kernel commit: 5fc1bee4c5 ]
2024-06-11 09:52:38 +02:00
Rostyslav Geyyer
25ae51c6f0
Add a convinvscale op, related instances and examples ( #1307 )
...
* Update the element op
* Add an example
* Add instances
* Add a client example
* make sure new instances only build on gfx9
* Update element op and its handling
* Format
* Update instances to take element op as an argument
* Update examples to use random scale values
* Format
* Update client example with random scales
* Format
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: ce66277a76 ]
2024-06-10 14:48:49 -05:00
dependabot[bot]
9aa25f53b4
Bump rocm-docs-core from 1.3.0 to 1.4.0 in /docs/sphinx ( #1327 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.3.0 to 1.4.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.3.0...v1.4.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: 8f5690c4bb ]
2024-06-06 22:38:26 -07:00
Bartłomiej Kocot
41c68496e6
Integrate universal gemm with conv forward ( #1320 )
...
* Integrate universal gemm with conv fwd
* Fix conv fwd wmma test
* Fix instances
* Remove direct load check
[ROCm/composable_kernel commit: ac58cc5d1d ]
2024-06-05 13:01:29 -05:00
dependabot[bot]
3627cf6cad
Bump rocm-docs-core from 1.2.1 to 1.3.0 in /docs/sphinx ( #1324 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.2.1 to 1.3.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.2.1...v1.3.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: ba82beb9bf ]
2024-06-05 07:36:39 -07:00
Rostyslav Geyyer
fec15d8c40
Add a scale op, related instances and examples ( #1242 )
...
* Add a scale op
* Update the element op
* Add instances
* Add an example
* Add a client example
* Add a flag check
* Revert flag check addition
* Fix flag check
* Update d strides in example
* Update d strides in client example
* Apply suggestions from code review
Update copyright header
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Move the example
* Move the client example
* Update element op
* Update example with the new element op
* Add scalar layout
* Update example
* Update kernel for scalar Ds
* Revert kernel changes
* Update element op
* Update example to use scales' pointers
* Format
* Update instances
* Update client example
* Move element op to unary elements
* Update element op to work with values instead of pointers
* Update instances to take element op as an argument
* Update examples to use random scale values
---------
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
[ROCm/composable_kernel commit: cb0645bedc ]
2024-06-04 19:28:15 -05:00
Dan Yao
fa3a589fa3
CK Tile FA Training kernels ( #1286 )
...
* FA fwd dropout
* FA bwd
* epilogue reuse
* CMakeLists update
* [CK_TILE] support alibi (#1269 )
* add alibi support
* fix code
* update code based on comment
* Support more hdim
* fix fp8 bias
* support seqlen_k=0 case
* remove unused printf
* fix format
---------
Co-authored-by: rocking <ChunYu.Lai@amd.com >
* now fwd/bwd can build
* bwd alibi
* add bwd validation stream_config
* update generated filenames
* update bwd kernel launch
* CK_TILE_HOST_DEVICE in philox
* Transpose -> transpose
* format
* format
* format
* Generate the instance for FA required
* format
* fix error in WarpGemm
---------
Co-authored-by: danyao12 <danyao12>
Co-authored-by: carlushuang <carlus.huang@amd.com >
Co-authored-by: rocking <ChunYu.Lai@amd.com >
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
Co-authored-by: Jing Zhang <jizhan@amd.com >
[ROCm/composable_kernel commit: 2cab8d39e3 ]
2024-06-04 13:12:45 -05:00
dependabot[bot]
cc607da2fa
Bump rocm-docs-core from 1.2.0 to 1.2.1 in /docs/sphinx ( #1322 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.2.0 to 1.2.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.2.0...v1.2.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: 76827d82ca ]
2024-06-03 22:41:56 -07:00
Illia Silin
11ce5b2508
disable the hipTensor test by default, only run once daily ( #1321 )
...
[ROCm/composable_kernel commit: 3fa7e2a6c4 ]
2024-06-03 14:07:30 -07:00
zjing14
551be3cb67
Post-merge fix of PR 1300 ( #1313 )
...
* add f8 gemm with multiD for both row/col wise
* change compute_type to fp8
* changed tuning parameters in the example
* add rcr example
* post-merge fix
* fix
* reduce init range
[ROCm/composable_kernel commit: 6fb1f4e03f ]
2024-05-31 22:46:41 -07:00
Illia Silin
237c390a30
Build CK library for all supported targets. ( #1312 )
...
* test library build for all supported targets
* increase the number of threads to build lib in CI to 64
[ROCm/composable_kernel commit: 34f3dfdd61 ]
2024-05-28 12:36:06 -07:00
dependabot[bot]
133909991c
Bump rocm-docs-core from 1.1.3 to 1.2.0 in /docs/sphinx ( #1311 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.1.3 to 1.2.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.3...v1.2.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: 66de8a02ba ]
2024-05-28 11:36:09 -07:00
zjing14
fe0f89d95d
add f8 gemm multiD with both row/col wise scale ( #1300 )
...
* add f8 gemm with multiD for both row/col wise
* change compute_type to fp8
* changed tuning parameters in the example
* add rcr example
[ROCm/composable_kernel commit: 80db62f08d ]
2024-05-28 12:04:22 -05:00
carlushuang
29df9783d6
[CK_TILE] support group from cmdline ( #1295 )
...
* support cmdline seqlen decode
* silent print
* update readme
* update kernel launch 3d
* update tile partitioner
* fix spill for bf16
* modify based on comment
* modify payload_t
* fix bug for alibi mode
* fix alibi test err
* refactor kernel launch, support select timer
* add missing file
* remove useless code
* add some comments
[ROCm/composable_kernel commit: 5055b3bdcb ]
2024-05-28 11:13:21 +08:00
Joseph Macaranas
548ddd0673
Enable external CI pipeline triggers ( #1310 )
...
[ROCm/composable_kernel commit: 02fa2c298b ]
2024-05-23 18:21:34 -04:00
Illia Silin
9bdae6116c
Split the gemm_multi_abd instances. ( #1306 )
...
* split the gemm_multi_abd instances
* update the dates
[ROCm/composable_kernel commit: ec2bae27ff ]
2024-05-23 09:17:02 -07:00
dependabot[bot]
e96d09f6b3
Bump rocm-docs-core from 1.1.2 to 1.1.3 in /docs/sphinx ( #1308 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.1.2 to 1.1.3.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.2...v1.1.3 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: 06a9b72caf ]
2024-05-23 07:45:53 -07:00
Max Podkorytov
564de0adc0
Make the library which generates CK instances for pytorch2 inductor's CK backend usage
...
Also bundle the CK library and include files with the pip package.
The package is pip-installable with
`pip install
git+https://github.com/tenpercent/composable_kernel@enable-pip `
(substitute the repo path and branch if necessary)
Testing:
`myenv/bin/python3 -m ck4inductor.universal_gemm.gen_instances`
(prints a list of instances)
`tree myenv/lib/python3.12/site-packages/ck4inductor`
(observe the list of sources along the installed package)
[ROCm/composable_kernel commit: 29e58d5b28 ]
2024-05-22 13:44:22 -07:00
Bartłomiej Kocot
b4b436d29a
Optimize grouped conv bwd weight for small M and N ( #1303 )
...
* Optimize grouped conv bwd weight for small M and N
* Fixes
[ROCm/composable_kernel commit: fd72380aeb ]
2024-05-22 21:01:01 +02:00
Illia Silin
beb7927f52
Select appropriate GPU targets for instances, tests, and examples. ( #1304 )
...
* set individual gpu targets for instances, examples, tests
* fix path to hip compiler
* fix path to hip compiler once more
* aggregate device macros in ck_tile config header
* fix the cmake logic for instances
* fix clang format
* add gfx900 and gfx906 to default set of targets
[ROCm/composable_kernel commit: 7b027d5643 ]
2024-05-22 11:45:27 -07:00
Rostyslav Geyyer
f6bd300ecb
Move grouped conv fwd client examples ( #1299 )
...
* Move grouped conv fwd client examples
* Update existing examples
* Format
[ROCm/composable_kernel commit: 204da9c522 ]
2024-05-21 09:52:41 -05:00
Illia Silin
ca0015bf39
aggregate device macros in ck_tile config header ( #1297 )
...
[ROCm/composable_kernel commit: 06b891c5c2 ]
2024-05-20 08:34:45 -07:00
Illia Silin
0003dce849
replace the ENV macro with CK_ENV ( #1296 )
...
[ROCm/composable_kernel commit: 1274861a9d ]
2024-05-17 10:42:51 -07:00
dependabot[bot]
3bd09036a3
Bump rocm-docs-core from 1.1.1 to 1.1.2 in /docs/sphinx ( #1293 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 1.1.1 to 1.1.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.1...v1.1.2 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: 6637a810d0 ]
2024-05-17 07:44:48 -07:00
rocking
54dc094a2c
Fix compile error ( #1292 )
...
error: no viable conversion from returned value of type '__half' to function return type 'fp16_hip_t' (aka '_Float16')
Co-authored-by: carlushuang <carlus.huang@amd.com >
[ROCm/composable_kernel commit: aaa8dfdae9 ]
2024-05-17 17:19:17 +08:00
Illia Silin
ca31c8515e
remove wrong use of nonexistent class members ( #1290 )
...
[ROCm/composable_kernel commit: c44137838e ]
2024-05-15 08:08:17 -07:00
carlushuang
96b7e7336a
remove operator-deref ( #1291 )
...
[ROCm/composable_kernel commit: dd0dd13d4e ]
2024-05-15 08:06:50 -07:00
jakpiase
290ac20e62
Add unit tests for grouped gemm two stage ( #1256 )
...
* add unit tests for grouped gemm two stage
* add reviewers suggestions
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
[ROCm/composable_kernel commit: 3e3471d5d2 ]
2024-05-15 10:03:39 +02:00
Illia Silin
2640cb1551
re-enable convnd_fwd_xdl_fp64 testing ( #1289 )
...
[ROCm/composable_kernel commit: 7843a8a7fb ]
2024-05-10 22:48:28 -07:00
Illia Silin
254758813f
Code clean-up ( #1285 )
...
* code clean-up
* remove the profiling output samples
[ROCm/composable_kernel commit: 566b6480a2 ]
2024-05-10 09:41:39 -07:00
carlushuang
9ca5aca74d
[CK_TILE] fix some rand number init ( #1287 )
...
* add random norm
* normalized default to 0/3
* change squant->auto
[ROCm/composable_kernel commit: fcba889ef4 ]
2024-05-10 09:03:39 -07:00
Bartłomiej Kocot
70f51bb03f
Change output gemm type to AccDataType in two stage conv bwd wei ( #1283 )
...
[ROCm/composable_kernel commit: 8346af9c68 ]
2024-05-10 10:57:42 +02:00
Adam Osewski
675a16e3b8
Fix MakeArgument ( #1284 )
...
[ROCm/composable_kernel commit: a0ae1c6133 ]
2024-05-09 09:42:41 -07:00