Commit Graph

1300 Commits

Author SHA1 Message Date
rocking
bf33df4e6e layernorm2d forward (#1339)
* Add layernorm2d forward

* Refind file path

* clang format

* Exclude ck_tile op from all

* use add_executable instead

* refactor layernorm2d_fwd example

---------

Co-authored-by: carlushuang <carlus.huang@amd.com>

[ROCm/composable_kernel commit: cb13839425]
2024-06-24 08:45:52 +08:00
Andriy Roshchenko
31abec679c Add instances of grouped convolution 3d forward with a ConvScale element-wise op for bf8@bf8->fp8 (#1326)
We are adding more instances of grouped convolution 3d forward with a ConvScale element-wise operation.
This commit handles bf8@bf8->fp8 data types combination.

* Included an example.
* Added instances.
* Added a client example.

---------

Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com>
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: 05b10e0e5a]
2024-06-21 19:02:57 -06:00
carlushuang
723dd9813e WA for rocm-6.2+ s constrait for buffer resource (#1346)
* WA for rocm-6.2+ s constrait for buffer resource

* add missing memory clobber

[ROCm/composable_kernel commit: fa129c1a5d]
2024-06-21 11:00:13 -05:00
Bartłomiej Kocot
cc0dd8a45e Fix cmake warnings (#1342)
* Cmake add -Wno-nvcc-compt

* Remove template without initialization list

* dpp remove template without init list

* Fixes

[ROCm/composable_kernel commit: 510325a468]
2024-06-21 09:47:58 +02:00
Dan Yao
c30ad40dfb Fix FA bwd alibi+causal NaN errors (#1352)
* fix bwd alibi nan error

* fix datatype

---------

Co-authored-by: danyao12 <danyao12>

[ROCm/composable_kernel commit: 1da802bdf2]
2024-06-20 09:50:53 -05:00
ThruptiRajLakshmanaGowda
428cefd1b5 Adding Missed Activation Functions for Grouped 2D/3D Convolutions (#1348)
* Initial Push

* First Push

* Fixed Clang format

* Resolve merge conflict

* Addressed review comments

* Addressed review comments

* Addressed review comments

[ROCm/composable_kernel commit: 0162a5f6ba]
2024-06-20 09:24:54 -05:00
Qianfeng
c70758ca61 Fix in dropout lambda to avoid the compiling issue on some docker/compiler envs (#1350)
[ROCm/composable_kernel commit: e3f44659cf]
2024-06-20 11:36:42 +08:00
zjing14
333c31bb42 Remove gfx900 and gfx906 from default target device to reduce package size (#1351)
[ROCm/composable_kernel commit: 8db331a511]
2024-06-19 11:47:18 -07:00
Qianfeng
44ec3b26d5 Hacking ck_tile fmha Dropout facility (#1344)
* Add NullBlockDropout to be used when kHasDropout is false

* Change to BlockDropout::Run() for forward to reduce conditional checkings

* Re-format files

---------

Co-authored-by: PoYen, Chen <PoYen.Chen@amd.com>

[ROCm/composable_kernel commit: 1973903f49]
2024-06-19 10:37:22 +08:00
Bartłomiej Kocot
e1c3bf298d Add read_first_lane function for int64 (#1347)
[ROCm/composable_kernel commit: 8faec23cb4]
2024-06-18 15:05:30 -05:00
jakpiase
163a866a5b Switch to universal gemm in grouped gemm tile loop (#1335)
* switch to universal gemm in grouped gemm tile loop

* minor fixes

* add reviewers comments

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: e2d139201b]
2024-06-18 09:01:49 -05:00
Bartłomiej Kocot
856b54e58b Fix continous dim selection in contraction (#1336)
* Fix continous dim selection in contraction

* Fixes

[ROCm/composable_kernel commit: 933951ed48]
2024-06-18 10:26:49 +02:00
carlushuang
447beaec1e [CK_TILE][FA] using pk f16_f32 (#1343)
* [CK_TILE][FA] using pk f16_f32

* correct a error

[ROCm/composable_kernel commit: 17ed368f58]
2024-06-17 17:16:46 +08:00
zjing14
4847f3beb4 disabled lds direct load inline asm (#1331)
[ROCm/composable_kernel commit: e02103168a]
2024-06-16 20:33:47 -05:00
Bartłomiej Kocot
d413c30ff4 Support large tensors in grouped conv fwd (#1332)
* Support large tensors in grouped conv fwd

* Multi ABD fixes

* Fix calculate element space size

[ROCm/composable_kernel commit: dc1e9c5df9]
2024-06-14 09:53:03 -05:00
Qianfeng
b8de94c07a Fix to the using of static_for in amd_buffer_addressing.hpp (#1337)
* Add insert_dummy_dep_per_dword over-loading for length 64

* Fix insert_dummy_dep_per_dword and remove over-loading for length 64

* Remove blank lines

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

[ROCm/composable_kernel commit: 37a347e380]
2024-06-13 16:12:20 +08:00
Rostyslav Geyyer
94c0cadc92 Add instances for grouped conv fwd 3d with ConvScale for fp8@bf8->fp8 (#1325)
* Add fp8 bf8 conv example

* Add instances

* Add client example

* Add random scale values

* Format

[ROCm/composable_kernel commit: acda4c5a3c]
2024-06-12 14:41:56 -05:00
Bartłomiej Kocot
f95b574ab1 Fix nhwgc f16 wmma instances (#1328)
[ROCm/composable_kernel commit: 5fc1bee4c5]
2024-06-11 09:52:38 +02:00
Rostyslav Geyyer
25ae51c6f0 Add a convinvscale op, related instances and examples (#1307)
* Update the element op

* Add an example

* Add instances

* Add a client example

* make sure new instances only build on gfx9

* Update element op and its handling

* Format

* Update instances to take element op as an argument

* Update examples to use random scale values

* Format

* Update client example with random scales

* Format

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: ce66277a76]
2024-06-10 14:48:49 -05:00
dependabot[bot]
9aa25f53b4 Bump rocm-docs-core from 1.3.0 to 1.4.0 in /docs/sphinx (#1327)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.3.0 to 1.4.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.3.0...v1.4.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 8f5690c4bb]
2024-06-06 22:38:26 -07:00
Bartłomiej Kocot
41c68496e6 Integrate universal gemm with conv forward (#1320)
* Integrate universal gemm with conv fwd

* Fix conv fwd wmma test

* Fix instances

* Remove direct load check

[ROCm/composable_kernel commit: ac58cc5d1d]
2024-06-05 13:01:29 -05:00
dependabot[bot]
3627cf6cad Bump rocm-docs-core from 1.2.1 to 1.3.0 in /docs/sphinx (#1324)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.2.1 to 1.3.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.2.1...v1.3.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: ba82beb9bf]
2024-06-05 07:36:39 -07:00
Rostyslav Geyyer
fec15d8c40 Add a scale op, related instances and examples (#1242)
* Add a scale op

* Update the element op

* Add instances

* Add an example

* Add a client example

* Add a flag check

* Revert flag check addition

* Fix flag check

* Update d strides in example

* Update d strides in client example

* Apply suggestions from code review

Update copyright header

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Move the example

* Move the client example

* Update element op

* Update example with the new element op

* Add scalar layout

* Update example

* Update kernel for scalar Ds

* Revert kernel changes

* Update element op

* Update example to use scales' pointers

* Format

* Update instances

* Update client example

* Move element op to unary elements

* Update element op to work with values instead of pointers

* Update instances to take element op as an argument

* Update examples to use random scale values

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: cb0645bedc]
2024-06-04 19:28:15 -05:00
Dan Yao
fa3a589fa3 CK Tile FA Training kernels (#1286)
* FA fwd dropout

* FA bwd

* epilogue reuse

* CMakeLists update

* [CK_TILE] support alibi (#1269)

* add alibi support

* fix code

* update code based on comment

* Support more hdim

* fix fp8 bias

* support seqlen_k=0 case

* remove unused printf

* fix format

---------

Co-authored-by: rocking <ChunYu.Lai@amd.com>

* now fwd/bwd can build

* bwd alibi

* add bwd validation stream_config

* update generated filenames

* update bwd kernel launch

* CK_TILE_HOST_DEVICE in philox

* Transpose -> transpose

* format

* format

* format

* Generate the instance for FA required

* format

* fix error in WarpGemm

---------

Co-authored-by: danyao12 <danyao12>
Co-authored-by: carlushuang <carlus.huang@amd.com>
Co-authored-by: rocking <ChunYu.Lai@amd.com>
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
Co-authored-by: Jing Zhang <jizhan@amd.com>

[ROCm/composable_kernel commit: 2cab8d39e3]
2024-06-04 13:12:45 -05:00
dependabot[bot]
cc607da2fa Bump rocm-docs-core from 1.2.0 to 1.2.1 in /docs/sphinx (#1322)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.2.0 to 1.2.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.2.0...v1.2.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 76827d82ca]
2024-06-03 22:41:56 -07:00
Illia Silin
11ce5b2508 disable the hipTensor test by default, only run once daily (#1321)
[ROCm/composable_kernel commit: 3fa7e2a6c4]
2024-06-03 14:07:30 -07:00
zjing14
551be3cb67 Post-merge fix of PR 1300 (#1313)
* add f8 gemm with multiD for both row/col wise

* change compute_type to fp8

* changed tuning parameters in the example

* add rcr example

* post-merge fix

* fix

* reduce init range

[ROCm/composable_kernel commit: 6fb1f4e03f]
2024-05-31 22:46:41 -07:00
Illia Silin
237c390a30 Build CK library for all supported targets. (#1312)
* test library build for all supported targets

* increase the number of threads to build lib in CI to 64

[ROCm/composable_kernel commit: 34f3dfdd61]
2024-05-28 12:36:06 -07:00
dependabot[bot]
133909991c Bump rocm-docs-core from 1.1.3 to 1.2.0 in /docs/sphinx (#1311)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.1.3 to 1.2.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.3...v1.2.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 66de8a02ba]
2024-05-28 11:36:09 -07:00
zjing14
fe0f89d95d add f8 gemm multiD with both row/col wise scale (#1300)
* add f8 gemm with multiD for both row/col wise

* change compute_type to fp8

* changed tuning parameters in the example

* add rcr example

[ROCm/composable_kernel commit: 80db62f08d]
2024-05-28 12:04:22 -05:00
carlushuang
29df9783d6 [CK_TILE] support group from cmdline (#1295)
* support cmdline seqlen decode

* silent print

* update readme

* update kernel launch 3d

* update tile partitioner

* fix spill for bf16

* modify based on comment

* modify payload_t

* fix bug for alibi mode

* fix alibi test err

* refactor kernel launch, support select timer

* add missing file

* remove useless code

* add some comments

[ROCm/composable_kernel commit: 5055b3bdcb]
2024-05-28 11:13:21 +08:00
Joseph Macaranas
548ddd0673 Enable external CI pipeline triggers (#1310)
[ROCm/composable_kernel commit: 02fa2c298b]
2024-05-23 18:21:34 -04:00
Illia Silin
9bdae6116c Split the gemm_multi_abd instances. (#1306)
* split the gemm_multi_abd instances

* update the dates

[ROCm/composable_kernel commit: ec2bae27ff]
2024-05-23 09:17:02 -07:00
dependabot[bot]
e96d09f6b3 Bump rocm-docs-core from 1.1.2 to 1.1.3 in /docs/sphinx (#1308)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.1.2 to 1.1.3.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.2...v1.1.3)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 06a9b72caf]
2024-05-23 07:45:53 -07:00
Max Podkorytov
564de0adc0 Make the library which generates CK instances for pytorch2 inductor's CK backend usage
Also bundle the CK library and include files with the pip package.

The package is pip-installable with
`pip install
git+https://github.com/tenpercent/composable_kernel@enable-pip`

(substitute the repo path and branch if necessary)

Testing:

`myenv/bin/python3 -m ck4inductor.universal_gemm.gen_instances`

(prints a list of instances)

`tree myenv/lib/python3.12/site-packages/ck4inductor`

(observe the list of sources along the installed package)


[ROCm/composable_kernel commit: 29e58d5b28]
2024-05-22 13:44:22 -07:00
Bartłomiej Kocot
b4b436d29a Optimize grouped conv bwd weight for small M and N (#1303)
* Optimize grouped conv bwd weight for small M and N

* Fixes

[ROCm/composable_kernel commit: fd72380aeb]
2024-05-22 21:01:01 +02:00
Illia Silin
beb7927f52 Select appropriate GPU targets for instances, tests, and examples. (#1304)
* set individual gpu targets for instances, examples, tests

* fix path to hip compiler

* fix path to hip compiler once more

* aggregate device macros in ck_tile config header

* fix the cmake logic for instances

* fix clang format

* add gfx900 and gfx906 to default set of targets

[ROCm/composable_kernel commit: 7b027d5643]
2024-05-22 11:45:27 -07:00
Rostyslav Geyyer
f6bd300ecb Move grouped conv fwd client examples (#1299)
* Move grouped conv fwd client examples

* Update existing examples

* Format

[ROCm/composable_kernel commit: 204da9c522]
2024-05-21 09:52:41 -05:00
Illia Silin
ca0015bf39 aggregate device macros in ck_tile config header (#1297)
[ROCm/composable_kernel commit: 06b891c5c2]
2024-05-20 08:34:45 -07:00
Illia Silin
0003dce849 replace the ENV macro with CK_ENV (#1296)
[ROCm/composable_kernel commit: 1274861a9d]
2024-05-17 10:42:51 -07:00
dependabot[bot]
3bd09036a3 Bump rocm-docs-core from 1.1.1 to 1.1.2 in /docs/sphinx (#1293)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.1.1 to 1.1.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.1...v1.1.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 6637a810d0]
2024-05-17 07:44:48 -07:00
rocking
54dc094a2c Fix compile error (#1292)
error: no viable conversion from returned value of type '__half' to function return type 'fp16_hip_t' (aka '_Float16')

Co-authored-by: carlushuang <carlus.huang@amd.com>

[ROCm/composable_kernel commit: aaa8dfdae9]
2024-05-17 17:19:17 +08:00
Illia Silin
ca31c8515e remove wrong use of nonexistent class members (#1290)
[ROCm/composable_kernel commit: c44137838e]
2024-05-15 08:08:17 -07:00
carlushuang
96b7e7336a remove operator-deref (#1291)
[ROCm/composable_kernel commit: dd0dd13d4e]
2024-05-15 08:06:50 -07:00
jakpiase
290ac20e62 Add unit tests for grouped gemm two stage (#1256)
* add unit tests for grouped gemm two stage

* add reviewers suggestions

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: 3e3471d5d2]
2024-05-15 10:03:39 +02:00
Illia Silin
2640cb1551 re-enable convnd_fwd_xdl_fp64 testing (#1289)
[ROCm/composable_kernel commit: 7843a8a7fb]
2024-05-10 22:48:28 -07:00
Illia Silin
254758813f Code clean-up (#1285)
* code clean-up

* remove the profiling output samples

[ROCm/composable_kernel commit: 566b6480a2]
2024-05-10 09:41:39 -07:00
carlushuang
9ca5aca74d [CK_TILE] fix some rand number init (#1287)
* add random norm

* normalized default to 0/3

* change squant->auto

[ROCm/composable_kernel commit: fcba889ef4]
2024-05-10 09:03:39 -07:00
Bartłomiej Kocot
70f51bb03f Change output gemm type to AccDataType in two stage conv bwd wei (#1283)
[ROCm/composable_kernel commit: 8346af9c68]
2024-05-10 10:57:42 +02:00
Adam Osewski
675a16e3b8 Fix MakeArgument (#1284)
[ROCm/composable_kernel commit: a0ae1c6133]
2024-05-09 09:42:41 -07:00