Commit Graph

1379 Commits

Author SHA1 Message Date
dependabot[bot]
f86faf86f0 Bump rocm-docs-core from 1.7.1 to 1.7.2 in /docs/sphinx (#1479)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.7.1 to 1.7.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.7.1...v1.7.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 0d9bf9f154]
2024-08-21 22:40:49 -07:00
Illia Silin
caf5f89ae3 fix the build errors with clang20 (#1478)
[ROCm/composable_kernel commit: 1925b322eb]
2024-08-21 21:29:48 -07:00
Andriy Roshchenko
f6c6819b47 Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. (#1473)
* Enable CMakePresets build

* Verify Convolution, Scaling and ReLU algorithms.

* Add tensor element-wise scale and type cast operation.

* Reduction implemented but does not work.

* Exploration of Reduction functionality.

* Completed example for Convolution scaled with ReLu activation and AMAX reduction.

* WIP: Add required instances for convolution.

* WIP: Create client example. Implement convolution stage.

* Add elementwise instances.

* Add elementwise scale + convert example.

* Add reduction instances.

* WIP: Client example for AMAX reduction.

* WIP: Add instances for multistage reduction.

* WIP: Implementation of multistage reduction.

* Refactoring.

* Clean up.

* Add CMakePresets.json

* Guard off FP8 instances when the data type is not available.

* Add example for Scaled FP8 Convolution with AMAX reduction.

* Refactor CombConvScaleRelu instances.

* Add CombConvScale instances.

* Add client example for Scaled FP8 Convolution with AMAX reduction.

* Cleanup.

[ROCm/composable_kernel commit: c3515f277c]
2024-08-21 15:22:41 -07:00
Rostyslav Geyyer
0ab95a332e Set RNE fp8 conversion as a default (#1458)
* Set RNE fp8 conversion as a default

* Update f8 tests

* Disable failing test on gfx11

* Update bf8 tests

* Add a flag

* Fix the flag

* Raise flag for gfx10 as well

* Temp commit for tolerance testing

* Update tolerances

[ROCm/composable_kernel commit: e20f20efbf]
2024-08-21 09:09:48 -07:00
Bartłomiej Kocot
c7b7771a91 Convert MIOpen driver to ckProfiler script typos fix (#1476)
[ROCm/composable_kernel commit: dc82daa86e]
2024-08-20 19:04:14 +02:00
Andriy Roshchenko
10edb0c70e Adding Instances and Examples for FP8-based Scaled Convolution with ReLU Activation and AMAX Reduction. (#1469)
* Enable CMakePresets build

* Verify Convolution, Scaling and ReLU algorithms.

* Add tensor element-wise scale and type cast operation.

* Reduction implemented but does not work.

* Exploration of Reduction functionality.

* Completed example for Convolution scaled with ReLu activation and AMAX reduction.

* WIP: Add required instances for convolution.

* WIP: Create client example. Implement convolution stage.

* Add elementwise instances.

* Add elementwise scale + convert example.

* Add reduction instances.

* WIP: Client example for AMAX reduction.

* WIP: Add instances for multistage reduction.

* WIP: Implementation of multistage reduction.

* Refactoring.

* Clean up.

* Guard off FP8 instances when the data type is not available.

* Improve output readability.

* Addressing reviewer's comments.

[ROCm/composable_kernel commit: a94113a941]
2024-08-20 10:30:56 -05:00
dependabot[bot]
2c7d3a1c22 Bump rocm-docs-core from 1.7.0 to 1.7.1 in /docs/sphinx (#1475)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.7.0 to 1.7.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.7.0...v1.7.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: f48529b511]
2024-08-19 23:02:07 -07:00
Bartłomiej Kocot
6e6277ca02 Add script to convert MIOpen driver to ckProfiler (#1472)
* Add script to convert MIOpen driver to ckProfiler

* Fix

[ROCm/composable_kernel commit: a6a7966505]
2024-08-19 08:24:56 -07:00
Illia Silin
ad65d8d5b0 Re-enable fp8 types for all architectures. (#1470)
* re-enable fp8 and bf8 for all targets

* restore the fp8 gemm instances

* re-enable conv_3d fp8 on all architectures

* diasble several fp8 gemm instances on all architectures except gfx94

* clang format fix

[ROCm/composable_kernel commit: c8b6b64240]
2024-08-16 16:07:52 -06:00
Dan Yao
14402bb211 [CK_TILE] FA bwd kernels optimization (#1397)
* tmp save

* fix batch deterministic bugs

* fix group deterministic bugs

* codegen update

* reorder files

* bias support

* hd256 bias support

* bwd smoke test update

* simplify convert dq

* fix hd256 dropout scratch

* do{}while() -> while(){}

* comments

* remove FmhaBwdTilePartitioner

* save clear_tile

* refactor dropout

* code cleanup

* code cleanup

* comments

* fix epilogue problem

* fix fwd dropout

* group convert_dq opt

* fix dq alignment

* Do not store storerandval in bwd for flash attention integration

* fix hd32 error and boost performance

* revert

* Remove duplicated WarpGemm definitions in the policy file

* dropout patch for mrepeat 16*16

* code sync up

* dq_acc stride

* dq_acc stride stuff

* codegen update

* fwd dropout revert

* fix hd128 scratches and boost performance

* receipt 3 for simplified smoke test

* more strides for fa integration

* fix hd64 scratches and boost performance

* non-iglp pipeline for headdim padding cases

* dpad same as dvpad for flash attention integration

* unpadded lse&d for group mode

* Support unpad layout for group lse

* Support unpad lse layout for splitkv

* Fix stride for splitkv kernel

* fix unpadded lse issue in fwd splitkv

* comment

* solve lds read&write conflicts

* rename

* bias rename

* tile index revert

---------

Co-authored-by: danyao12 <danyao12>
Co-authored-by: rocking <ChunYu.Lai@amd.com>
Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>

[ROCm/composable_kernel commit: 79a5d9c10c]
2024-08-16 13:40:10 -07:00
Bartłomiej Kocot
dffd5eacc0 Add performance and large tensor tests for grouped conv (#1456)
* Add performance and large tensor tests for grouped conv

* Resize tests

* Resize tests

* update the python script to parse the grouped_conv results

* Remove int8 tests

* change bwd wei layout

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 2581727d2a]
2024-08-16 07:48:30 -07:00
dependabot[bot]
e77e18da19 Bump rocm-docs-core from 1.6.2 to 1.7.0 in /docs/sphinx (#1467)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.2 to 1.7.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.2...v1.7.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 76bd0af6af]
2024-08-15 13:59:40 -07:00
trixirt
1eeb32a64d Check compiler flags before using (#1403)
* Check compiler flags before using

The user's compiler may not support these flags, so check.
Resolves failures on Fedora.

Signed-off-by: Tom Rix <trix@redhat.com>

* fix syntax CMakeLists.txt

Fix syntax in the check_cxx_compiler_flag.

---------

Signed-off-by: Tom Rix <trix@redhat.com>
Co-authored-by: Tom Rix <trix@redhat.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 49769ec889]
2024-08-14 20:43:10 -07:00
Haocong WANG
65d6442b4c [GEMM] gemm_universal related optimization (#1453)
* replace buffer_atomic with global_atomic

* fixed global_atomic_add

* added bf16 atomic_add

* format

* clang-format-12

* clean

* clean

* add guards

* Update gtest.cmake

* enabled splitk_gemm_multi_d

* format

* add ckProfiler

* format

* fixed naming

* format

* clean

* clean

* add guards

* fix clang format

* format

* add kbatch printout

* clean

* Add rocm6.2 related gemm optimization

* Limit bf16 atomic usage

* remove redundant RCR gemm_universal instance

* Add RRR fp8 gemm universal instance

* Bug fix

* Add GPU_TARGET guard to FP8/BF8 target

* bug fix

* update cmake

* remove all fp8/bf8 example if arch not support

* Enable fp8 RRR support in ckProfiler

* limit greedy-reverse flag to gemm_universal in ckProfiler

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: Jing Zhang <jizhan@meta.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 3049b5467c]
2024-08-14 10:42:30 +08:00
AngryLoki
6a4b36d948 Fix compilation errors with libc++ (#1461)
This fixes 2 issues when compiled with libc++.

First issue is attempt to call std::numeric_limits<ranges::range_value_t<_Float16>>::min().
_Float16 is extension of libstdc++, it does not exist in C++ standard[2].
Luckily, there is NumericLimits class in composable_kernel, which does everything needed.

Second issue with call to 'check_err' is ambiguous: there are 2 candidates.
It happens because composable_kernel relies on idea that f8_t (defined as _BitInt(8)) does not pass is_integral trait.
However, libc++ treats _BitInt(N) as integral (per standard "any implementation-defined extended integer types" can be integral).

Closes: #1460

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>

[ROCm/composable_kernel commit: 50c423481b]
2024-08-13 14:31:15 -05:00
Mateusz Ozga
7a4690b077 Support large: 12d tensor size for reduction kenrel (#1465)
[ROCm/composable_kernel commit: 0606e5498e]
2024-08-13 16:15:47 +02:00
Illia Silin
92df7893df Disable inapplicable xdl and mha instances for gfx12 (#1464)
[ROCm/composable_kernel commit: cbb6f2ab8c]
2024-08-12 15:11:58 -07:00
Mateusz Ozga
b7b9eb73c7 Rewrite *sh reduce unit tests to gtest: part 1 (#1407)
* Rewrite .sh test to Gtest

* review chnages

* Removew unused comments

* Review v2

* Typo

* Separete UT: AMAX, MAX, MIN; added template params to trigger them

* Update test/reduce/reduce_no_index.cpp

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: ab60b390f8]
2024-08-12 16:28:10 +02:00
Bartłomiej Kocot
15ab8b0d5c Fix bug with n block id calculation in DeviceGroupedConvXdlCShuffle (#1457)
* Fix typo in TransformConvFwdToGemm

* Fix bug in n offset calculation

[ROCm/composable_kernel commit: 4a870942e6]
2024-08-10 13:12:05 +02:00
arai713
ab0829d8bd Codegen build w/CK (#1428)
* initial push

* cleaned up compiler errors

* removed commented code

* build codegen folder only for gfx9 targets

* remove separate stage for codegen tests from CI

* removed commented code from CMake

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: da214a5a58]
2024-08-09 08:15:06 -07:00
Jun Liu
254a7dadb6 Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) (#1415)" (#1455)
This reverts commit 0c367d5912486f4fcbae1dbb861a1fb8176ca308.

[ROCm/composable_kernel commit: 5ff8eeebf9]
2024-08-08 19:09:33 -07:00
Illia Silin
bfb128a8cf Enable CI on gfx12. (#1454)
* enable CI build and test on gfx1201

* skip DL kernels in CI for gfx12

* only run CI on gfx12 if rocm version >= 6.2

* remove the rocm version check for CI on gfx12

* add a switch for CI builds on gfx12

[ROCm/composable_kernel commit: 4a5ab67871]
2024-08-08 16:29:15 -07:00
Illia Silin
9e9b3d563b check if the coerce-illegal-types flag is supported (#1451)
[ROCm/composable_kernel commit: ae3b8ff86c]
2024-08-08 07:29:29 -07:00
Illia Silin
38fe3e7936 add rocm-llvm-dev package to docker image (#1452)
[ROCm/composable_kernel commit: 8a75728406]
2024-08-08 07:29:13 -07:00
Juan Manuel Martinez Caamaño
61ecdbc128 Remove reinterpret_cast uses that result in undefined behaviour. (#1445)
* Remove reinterpret_cast uses that result in undefined behaviour. Use a bitcast instead.

See https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_accessibility

Closes #1439

* fix clang format

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 901e5f1540]
2024-08-07 11:49:02 -07:00
Illia Silin
760fcb96f3 upgrade to rocm6.2 as new default compiler (#1448)
[ROCm/composable_kernel commit: 5df10432d8]
2024-08-07 09:38:43 -07:00
dependabot[bot]
5d69137f37 Bump rocm-docs-core from 1.6.1 to 1.6.2 in /docs/sphinx (#1449)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.1 to 1.6.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.1...v1.6.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: a71d407e35]
2024-08-07 08:22:38 -07:00
Illia Silin
e377ca404b Run CK_TILE FMHA benchmarks and collect the performance data. (#1447)
* run ck_tile benchmarks after the smoke tests and store logs

* change the path of fmha benchmark logs

* change the way of stashig ck_tile fmha logs

* prevent the errors in stages where no logs are generated

* fix the ck_tile fmha log names and headers

* generate the fmha performance logs in the root folder

* change jenkins scrip arguments format

* use exact file names for stashing

* modify scripts to process FMHA performance results

* unstash FMHA logs before parsing them

[ROCm/composable_kernel commit: 12c1f68dd9]
2024-08-07 08:18:26 -07:00
Max Podkorytov
f4b5582b2a modify python wrapper for addmm (#1441)
[ROCm/composable_kernel commit: 886d14ccb2]
2024-08-06 15:09:27 -07:00
Haocong WANG
78bf13c11a Limit fp8only operator build arch in ckProfiler (#1443)
[ROCm/composable_kernel commit: 6fc7bff58f]
2024-08-06 14:29:14 -07:00
Jun Liu
37921efb24 Fix ROCm 6.2 compiler not fully supporting gfx12 when building CK with INSTANCES_ONLY (#1446)
[ROCm/composable_kernel commit: afbf6350f3]
2024-08-06 13:06:53 -07:00
Juan Manuel Martinez Caamaño
e539c37e7d Add missing constexpr to if conditions (#1444)
[ROCm/composable_kernel commit: fd9ef4e678]
2024-08-06 11:40:34 -07:00
bibek
c8c3293b0b adding mha as static lib (#1366)
* adding mha as static lib

* add fmha fwd compile options

* typo

* fix python version

* python version to 3

* increase path length

* add max path flag in mha cmake

* fix long path issue

* mha currently only runs in gfx94x

* only buld mha in mi300

* populate gpu_list

* add mha compile flags

* avoid building mha in gpu other then gfx94x

* some comments and  include ck_tile in rocm

* use rocm_install

* place ck_tile in include

* correct ck_tile path

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 840c5397bb]
2024-08-06 11:17:10 -05:00
jakpiase
e8ee8856fa Fix for beta!=0 in reduce (#1440)
* fix for beta!=0 in reduce

* add reviewers suggestions

[ROCm/composable_kernel commit: b74d4d4d54]
2024-08-06 09:10:39 -07:00
Bartłomiej Kocot
69a6b563f9 Add Grouped Conv Fwd Large Tensor kernel (#1432)
* Support 64 bit indexing

* Add new grouped conv fwd kernel for large tensors

* Add instances large tensor

* Fixes for transform conv to gemm

* Fixes

* fixes

* Remove not needed instances

* examples fixes

* Remove not need ds arrays

* Fix tests

* Add 2GB check in gridwise dl

* Fixes

[ROCm/composable_kernel commit: 4ec5c52a0c]
2024-08-06 10:06:10 +02:00
Illia Silin
8f71de4707 add --offload-compress compiler flag (#1433)
* add --offload-compress compiler flag

* only apply the --offload-compress flag to the ckProfiler

* move the --offload-compress flag back to main cmake file

* add offload-compress to target compile option of ckProfiler

---------

Co-authored-by: carlushuang <carlus.huang@amd.com>

[ROCm/composable_kernel commit: 7f57b2e02c]
2024-08-05 23:26:01 +08:00
Illia Silin
1a8f8fce5b [CI][Jenkins] delete CI docker container upon exit (#1437)
[ROCm/composable_kernel commit: f31ba04afc]
2024-08-05 08:13:56 -07:00
Illia Silin
3d3819e0b3 Add compiler flags for ROCm versions 6.2+ (#1429)
* add compiler flags to fix compiler issues

* fix typo.

* disable test_smfmac_op on all devices except gfx942

* specify full path to compiler in CI

[ROCm/composable_kernel commit: d311c95396]
2024-08-01 08:27:52 -07:00
Sam Wu
604152a68b Update doc requirements (#1423)
[ROCm/composable_kernel commit: 6648fd3b04]
2024-07-31 07:42:42 -07:00
zjing14
807edd542a [HotFix] Fixed a typo in profile_gemm_multiply_multiply (#1425)
* fixed a typo

* clean

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>

[ROCm/composable_kernel commit: f31e8dfa80]
2024-07-31 07:19:17 -07:00
arai713
735984bb5a Codegen: isSupportedArgument check (#1417)
* added isSupportedArgument check into codegen device op

* adding function call

* remove commented code

[ROCm/composable_kernel commit: d32997a792]
2024-07-31 07:12:15 -07:00
carlushuang
cecee51c65 workaround rocm-6.2 compiler issue (#1421)
[ROCm/composable_kernel commit: b3f86e79dd]
2024-07-31 16:03:59 +08:00
Illia Silin
4e86ab9f21 add docker for rocm6.2_rc4 compiler (#1424)
[ROCm/composable_kernel commit: b527cad4a5]
2024-07-30 11:55:33 -07:00
Bartłomiej Kocot
1567614d80 Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) (#1415)
[ROCm/composable_kernel commit: 33b399cc15]
2024-07-30 18:36:04 +02:00
dependabot[bot]
9ab0227208 Bump rocm-docs-core from 1.6.0 to 1.6.1 in /docs/sphinx (#1420)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.0 to 1.6.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.0...v1.6.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: b9ba5b2676]
2024-07-26 14:47:19 -07:00
trixirt
9348857732 Introduce cmake USE_GLIBCXX_ASSERTIONS option (#1404)
A standard option in Fedora packaging that is used to check
the correctness of c++ use of the standard c++ library.

Signed-off-by: Tom Rix <trix@redhat.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 733f33af78]
2024-07-25 19:28:17 -07:00
zjing14
a94e87d868 Add rotating buff for gemm_multi_d (#1411)
* add rotating_buff for gemm_multi_d

* format

* Update flush_cache.hpp

* Update gtest.cmake

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: Haocong WANG <haocwang@amd.com>

[ROCm/composable_kernel commit: 105bd708c7]
2024-07-25 23:21:21 +08:00
dependabot[bot]
0686a1b400 Bump rocm-docs-core from 1.5.1 to 1.6.0 in /docs/sphinx (#1416)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.5.1 to 1.6.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.5.1...v1.6.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 1208082e53]
2024-07-24 22:56:29 -07:00
Andriy Roshchenko
e3b469a493 Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. (#1412)
* Add CMakePresets configurations.

* Add binary elementwise ConvScaleAdd and an example.

* Numerical verification of results.

Observed significant irregularities in F8 to F32 type conversions:
```log
ConvScaleAdd: float=145.000000   f8_t=160.000000    e=144.000000
ConvScaleAdd: float=97.000000   f8_t=96.000000    e=104.000000
ConvScaleAdd: float=65.000000   f8_t=64.000000    e=72.000000
```

* Implemented ConvScaleAdd + Example.

* Add ConvScale+Bias Instances

* Add Client Example for ConvScale+Bias

* Fix number of bytes in an example..

* Cleanup.

[ROCm/composable_kernel commit: 4a8a1befd5]
2024-07-24 15:49:55 -05:00
Bartłomiej Kocot
1f93d3f961 Add support for half_t and bfloat to reduction operations (#1395)
* Add support for half_t and bfloat to reduction operations

* Fix bhalf convert

* Next fix bf16

[ROCm/composable_kernel commit: ffabd70a15]
2024-07-24 12:12:37 -05:00