Commit Graph

1360 Commits

Author SHA1 Message Date
Max Podkorytov
3aeb97f3ec modify python wrapper for addmm (#1441)
[ROCm/composable_kernel commit: 886d14ccb2]
2024-08-06 15:09:27 -07:00
Haocong WANG
62bd135b8c Limit fp8only operator build arch in ckProfiler (#1443)
[ROCm/composable_kernel commit: 6fc7bff58f]
2024-08-06 14:29:14 -07:00
Jun Liu
e79f20f9ac Fix ROCm 6.2 compiler not fully supporting gfx12 when building CK with INSTANCES_ONLY (#1446)
[ROCm/composable_kernel commit: afbf6350f3]
2024-08-06 13:06:53 -07:00
Juan Manuel Martinez Caamaño
f2b070380e Add missing constexpr to if conditions (#1444)
[ROCm/composable_kernel commit: fd9ef4e678]
2024-08-06 11:40:34 -07:00
bibek
e0ef2ba864 adding mha as static lib (#1366)
* adding mha as static lib

* add fmha fwd compile options

* typo

* fix python version

* python version to 3

* increase path length

* add max path flag in mha cmake

* fix long path issue

* mha currently only runs in gfx94x

* only buld mha in mi300

* populate gpu_list

* add mha compile flags

* avoid building mha in gpu other then gfx94x

* some comments and  include ck_tile in rocm

* use rocm_install

* place ck_tile in include

* correct ck_tile path

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 840c5397bb]
2024-08-06 11:17:10 -05:00
jakpiase
ec527c487e Fix for beta!=0 in reduce (#1440)
* fix for beta!=0 in reduce

* add reviewers suggestions

[ROCm/composable_kernel commit: b74d4d4d54]
2024-08-06 09:10:39 -07:00
Bartłomiej Kocot
458d8bef26 Add Grouped Conv Fwd Large Tensor kernel (#1432)
* Support 64 bit indexing

* Add new grouped conv fwd kernel for large tensors

* Add instances large tensor

* Fixes for transform conv to gemm

* Fixes

* fixes

* Remove not needed instances

* examples fixes

* Remove not need ds arrays

* Fix tests

* Add 2GB check in gridwise dl

* Fixes

[ROCm/composable_kernel commit: 4ec5c52a0c]
2024-08-06 10:06:10 +02:00
Illia Silin
1152542dab add --offload-compress compiler flag (#1433)
* add --offload-compress compiler flag

* only apply the --offload-compress flag to the ckProfiler

* move the --offload-compress flag back to main cmake file

* add offload-compress to target compile option of ckProfiler

---------

Co-authored-by: carlushuang <carlus.huang@amd.com>

[ROCm/composable_kernel commit: 7f57b2e02c]
2024-08-05 23:26:01 +08:00
Illia Silin
a5c15d2b51 [CI][Jenkins] delete CI docker container upon exit (#1437)
[ROCm/composable_kernel commit: f31ba04afc]
2024-08-05 08:13:56 -07:00
Illia Silin
4cfc157a28 Add compiler flags for ROCm versions 6.2+ (#1429)
* add compiler flags to fix compiler issues

* fix typo.

* disable test_smfmac_op on all devices except gfx942

* specify full path to compiler in CI

[ROCm/composable_kernel commit: d311c95396]
2024-08-01 08:27:52 -07:00
Sam Wu
962232deb9 Update doc requirements (#1423)
[ROCm/composable_kernel commit: 6648fd3b04]
2024-07-31 07:42:42 -07:00
zjing14
b7c617b665 [HotFix] Fixed a typo in profile_gemm_multiply_multiply (#1425)
* fixed a typo

* clean

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>

[ROCm/composable_kernel commit: f31e8dfa80]
2024-07-31 07:19:17 -07:00
arai713
603f44b82c Codegen: isSupportedArgument check (#1417)
* added isSupportedArgument check into codegen device op

* adding function call

* remove commented code

[ROCm/composable_kernel commit: d32997a792]
2024-07-31 07:12:15 -07:00
carlushuang
3fdf17947f workaround rocm-6.2 compiler issue (#1421)
[ROCm/composable_kernel commit: b3f86e79dd]
2024-07-31 16:03:59 +08:00
Illia Silin
47f355e6fe add docker for rocm6.2_rc4 compiler (#1424)
[ROCm/composable_kernel commit: b527cad4a5]
2024-07-30 11:55:33 -07:00
Bartłomiej Kocot
aa83424e9c Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) (#1415)
[ROCm/composable_kernel commit: 33b399cc15]
2024-07-30 18:36:04 +02:00
dependabot[bot]
e48dea2908 Bump rocm-docs-core from 1.6.0 to 1.6.1 in /docs/sphinx (#1420)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.0 to 1.6.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.0...v1.6.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: b9ba5b2676]
2024-07-26 14:47:19 -07:00
trixirt
059252be31 Introduce cmake USE_GLIBCXX_ASSERTIONS option (#1404)
A standard option in Fedora packaging that is used to check
the correctness of c++ use of the standard c++ library.

Signed-off-by: Tom Rix <trix@redhat.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 733f33af78]
2024-07-25 19:28:17 -07:00
zjing14
87e7be2845 Add rotating buff for gemm_multi_d (#1411)
* add rotating_buff for gemm_multi_d

* format

* Update flush_cache.hpp

* Update gtest.cmake

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: Haocong WANG <haocwang@amd.com>

[ROCm/composable_kernel commit: 105bd708c7]
2024-07-25 23:21:21 +08:00
dependabot[bot]
7d27dd11b8 Bump rocm-docs-core from 1.5.1 to 1.6.0 in /docs/sphinx (#1416)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.5.1 to 1.6.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.5.1...v1.6.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 1208082e53]
2024-07-24 22:56:29 -07:00
Andriy Roshchenko
df929c14be Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. (#1412)
* Add CMakePresets configurations.

* Add binary elementwise ConvScaleAdd and an example.

* Numerical verification of results.

Observed significant irregularities in F8 to F32 type conversions:
```log
ConvScaleAdd: float=145.000000   f8_t=160.000000    e=144.000000
ConvScaleAdd: float=97.000000   f8_t=96.000000    e=104.000000
ConvScaleAdd: float=65.000000   f8_t=64.000000    e=72.000000
```

* Implemented ConvScaleAdd + Example.

* Add ConvScale+Bias Instances

* Add Client Example for ConvScale+Bias

* Fix number of bytes in an example..

* Cleanup.

[ROCm/composable_kernel commit: 4a8a1befd5]
2024-07-24 15:49:55 -05:00
Bartłomiej Kocot
5bb30d4077 Add support for half_t and bfloat to reduction operations (#1395)
* Add support for half_t and bfloat to reduction operations

* Fix bhalf convert

* Next fix bf16

[ROCm/composable_kernel commit: ffabd70a15]
2024-07-24 12:12:37 -05:00
dependabot[bot]
fed6f87e5c Bump rocm-docs-core from 1.5.0 to 1.5.1 in /docs/sphinx (#1414)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.5.0 to 1.5.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.5.0...v1.5.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 33b2a2bdf5]
2024-07-24 07:10:50 -07:00
Haocong WANG
2596f81b45 disable bad instance (#1410)
[ROCm/composable_kernel commit: d22713a719]
2024-07-23 09:05:03 -07:00
Bartłomiej Kocot
ae1cdf5056 Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406)
[ROCm/composable_kernel commit: 5d8c3d8190]
2024-07-22 14:21:24 +02:00
Haocong WANG
be31f1ddf3 [GEMM] F8 GEMM, performance optimized. (#1384)
* add ab_scale init support

* enabled interwave

* add scale type; update isSupport

* adjust example

* clean

* enable f8 pure gemm rcr ckprofiler

* Add gemm_multiply_multiply instances

* clang format

* Optimize for ScaleBlockMNK=128

* enable abscale f8 gemm ck profiler

* Add pure f8 gemm test suite

* Reverting to the state of project at f60fd77

* update copyright

* clang format

* update copyright

---------

Co-authored-by: root <jizhan@amd.com>

[ROCm/composable_kernel commit: 8c90f25be3]
2024-07-19 22:06:52 +08:00
ltqin
b12dae8c68 Universal gemm splitk using reduce (with multi-d) (#1341)
* init for reduce_threadwise multi_d

* add reduce_threadwise_multi_d

* add reduce_multi_d

* clean

* start add an other splitk device op

* add reduce template parameter to SplitKBatchOffset

* add reduce c matrix

* clean up code

* change example data type to bf16

* add bf16Ai8B example

* remove reduce template parameter

* add splitk atomic status to v4

* example add multi d parameters

* device op add multi-d parameters

* add multi-d to reduce

* fix kbach=1 bug

* change B layout to col in  bf16Ai8B example

* remove float adding struct

* change  multi-d interface

* change file and class name

* remove multi-d of bf16Ai8B example

* change IsReduce function to IsReduceAdd

* change example layout to RRR from RCR

* according layout to set ds stride

* reset parameter layout

* add gemm universal reduce instance

* add reduce factory

* add profile_gemm_universal_reduce

* add reduce to profiler

* fix reduce instance

* fix profiler reduce compiling bug

* format

* format library instance code

* add mem instance for reduce library

* fix call instance names

* add workspace for reduce in ckProfiler

* format

* add mnpading to reduce library instance

* add fp16 instance to reduce of profiler

* change copyright time

* restore profiler cmake file

* add reduce text to instances

* add DsLayout and DsDataType to instances template parameter

* fixed gemm_reduce_multi_d

* add an example without multi_d

* Update common.hpp

* Update gtest.cmake

* Update gemm_xdl_splitk_reduce_bf16.cpp

* clean

* Update gtest.cmake

* format

* fixe api

* format

* default parameter change to RRR

* add vector_len for multi_d

* format

* Update gtest.cmake

* fix bf16A iBB elementwiseop

* add ReduceDataType

* move ReduceDataType to end position

* format

* remove googletest git method  address

* fix copyright time

* update init data

---------

Co-authored-by: root <jizhan@amd.com>
Co-authored-by: letaoqin <letaoqin@amd.com>
Co-authored-by: Jing Zhang <jizhan@meta.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: c544eb4da0]
2024-07-19 22:01:22 +08:00
Bartłomiej Kocot
321353f5a7 Refactor transform conv to gemm fwd (#1391)
* Refactor transform conv to gemm fwd

* fixes codegen

* wmma fixes

* fix wmma

* Fix copyright

[ROCm/composable_kernel commit: 70a814f163]
2024-07-19 09:29:25 +02:00
Illia Silin
a2f40a3276 add docker for rocm6.2_rc3 (#1401)
[ROCm/composable_kernel commit: ab250afda0]
2024-07-18 09:41:33 -07:00
Qianfeng
03fb9b9435 Replace the using of __expf by __ocml_exp_f32 to work-around the test_softmax_rank4 failure (#1394)
[ROCm/composable_kernel commit: ee768148f0]
2024-07-17 09:15:05 -07:00
Mateusz Ozga
118c9e7c28 An option whether to colorize output during build (#1390)
[ROCm/composable_kernel commit: 9cac282793]
2024-07-16 09:52:44 -07:00
Illia Silin
132beacaa6 [ASAN builds] Modify the list of default targets for ASAN builds. (#1389)
* add a build parameter to build only XNACK targets

* use ENABLE_ASAN_PACKAGING flag to set targets for ASAN builds

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: 4c3107fdcb]
2024-07-16 09:19:23 -07:00
Andriy Roshchenko
ece7edc492 Adding more instances of grouped convolution 3d forward for FP8 with ConvScale element-wise operation and ReLU activation. (#1386)
* Add CMakePresets configurations.

* Add ConvScale+ReLU Functor and an Example

* Account for ReLU FLOPs.

* Add instances of 3D convolutions with ConvscaleRelu operation.

* Implement Client Example

* Cleanup

[ROCm/composable_kernel commit: 802a8a1df1]
2024-07-16 08:51:49 -07:00
Haocong WANG
8073a8d846 Disbale failed instance in rocm6.2 rel (#1388)
[ROCm/composable_kernel commit: 1ff4f25138]
2024-07-16 08:46:48 -07:00
Illia Silin
fdb75dcc1b add Rosty and Bartek to code owners (#1392)
[ROCm/composable_kernel commit: eca39050c6]
2024-07-16 23:44:46 +08:00
Bartłomiej Kocot
c885afdaae Support access per groups and filter3x3 in grouped conv fwd (#1382)
* Support access per groups and filter3x3 in grouped conv fwd

* Fixes for large cases

* Fixes for large tensors

[ROCm/composable_kernel commit: 82e8a78a3f]
2024-07-12 11:08:42 -07:00
zjing14
fcb0e225db add gemm_bias_add example (#1361)
* add gemm_bias_add example

* changed strideD

* clang-format

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 13c1e64daa]
2024-07-11 18:08:07 -07:00
Rostyslav Geyyer
31e30c0e4e Add instances for grouped conv fwd 3d with ConvScale for bf8@fp8->fp8 (#1369)
* Add an example

* Add instances

* Add a client example

[ROCm/composable_kernel commit: 7a46a91c84]
2024-07-11 13:31:39 -07:00
Illia Silin
4488950d01 Add CK_TILE tests to daily CI builds. (#1381)
* add ck_tile tests to CI

* build and run ck_tile tests on gfx90a and gfx942 in parallel

* fix groovy syntax

* turn ck_tile tests OFF by default

* skip creating the build folder

* build ck_tile examples with 64 threads

* build ck_tile examples with cmake-ck-dev.sh script

* add video group to docker on mi300

* do not retry to rebuild the early CI stages

* help prevent jenkins false failure

* restore cron trigger

[ROCm/composable_kernel commit: 98a01bbc72]
2024-07-11 13:22:40 -07:00
Illia Silin
3b49b68660 [Jenkins] restore cron jobs (#1380)
* test the cron trigger

* fix the cron jobs

* restore the list of cron jobs

[ROCm/composable_kernel commit: f914c228c6]
2024-07-11 10:28:11 -07:00
Illia Silin
80960d5c55 [gfx12] add gfx12 to the default target list (#1379)
[ROCm/composable_kernel commit: a8eb872055]
2024-07-10 14:54:04 -07:00
Sam Wu
a7496b8dae Update changelog release headers (#1378)
* Update doc codeowner syntax

* Add doc link to changelog

* Update changelog formatting for markdownlint

Also change headings for releases

[ROCm/composable_kernel commit: 860f957c22]
2024-07-10 09:36:10 -06:00
dependabot[bot]
6da72dc223 Bump rocm-docs-core from 1.4.1 to 1.5.0 in /docs/sphinx (#1374)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.4.1 to 1.5.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.4.1...v1.5.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>

[ROCm/composable_kernel commit: da42a88964]
2024-07-09 12:48:23 -07:00
carlushuang
d50bc4c39d update owner (#1377)
* remove zjing14, add poyenc

* remove yigex

[ROCm/composable_kernel commit: ccfdc53022]
2024-07-09 20:30:07 +08:00
Illia Silin
8dab3d4d1e Fix the cmake logic when building with INSTANCES_ONLY=ON. (#1376)
* fix the cmake logic when building for various targets

* another minor fix

[ROCm/composable_kernel commit: a328df25a1]
2024-07-08 21:21:16 -07:00
carlushuang
fca10f32d9 [CK_TILE] wa prec, remove sgpr offset for inline asm (#1356)
* wa prec, remove sgpr offset for inline asm

* macro for set tile

* ignore unused param if no kernel instances in host API

* fix more prec issue

* cache buffer resource

* fix

* support pre-nop

* clear tile by vector type members

* add workaround to reduce scratch memory

* conditionally enable workaround code

* enable workaround start from certain build version

* fallback set_tile() implementation from certain build version

* undo template argument changes

* put dummy asm in load_raw()

* fix comments, refactor s_nop inside buffer_load

---------

Co-authored-by: PoYen, Chen <PoYen.Chen@amd.com>

[ROCm/composable_kernel commit: 8182976c37]
2024-07-08 11:09:55 -07:00
Andriy Roshchenko
25aa95786a Add ckProfiler support for forward 3D convolutions with OUT element-wise operations. (#1354)
[ROCm/composable_kernel commit: eb44e0472a]
2024-07-08 10:55:54 -07:00
Harisankar Sadasivan
45802765e0 Universal streamk with atomics (#1360)
* universal streamk with atomics with ckprofiler support. grid_size and streamk strategy are tunable. grid_size of -1 leads to #WGs = maximum occupancy X num_CUs. implementation supports many different streamk policies: 1-tile, 2-tile, 3-tile and 4-tile. streamk strategy of -1 leads to default streamk policy (4-tile). 

* Update README.md

* fixing clang-format issues

* removed conflicts in struct members between streamk and universal streamk

* corrected arg parsing for streamk and universal streamk

* added stream-k policies for 3 tile and 4 tile

* fixed argument type issue with parsing cmd args

* changes suggested in PR review are made- removing comments and correcting copyright

* file permissions updated

* added default value support for grid_size and streamk-policy selection set to -1

* print messages for arguments

* print messages for arguments

* print messages for arguments1

[ROCm/composable_kernel commit: 75e622f02f]
2024-07-05 21:40:30 -07:00
jakpiase
605ac804c4 Add structural sparsity xdlops (#1363)
* Implemented smfmac xdlops

* add reviewer comments

[ROCm/composable_kernel commit: eaa870a1ab]
2024-07-04 12:00:14 +02:00
Jun Liu
fa73739812 Fix issue with multiple targets and remove smfmac tests from unsupported test targets (#1372)
[ROCm/composable_kernel commit: 959073842c]
2024-07-03 23:34:38 -07:00